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I.  QUASI-NEWTON  ALGORITHMS  FOR  CONSTRAINED  NONLINEAR  PROGRAMMING 

(M.  S.  Bazaraa) 

1.1  Introduction 

Nonlinear  programming  has  long  been  of  interest  to  mathematicians, 
engineers,  and  management  scientists.  Recent  developments  in  the  field 
of  nonlinear  programming,  especially  these  related  to  computing  a  search 
direction  and  to  computing  a  stepsize,  and  the  advent  of  the  high-speed 
and  large-memory  computers  have  made  it  possible  to  numerically  solve 
nonlinear  programming  problems  of  great  complexity.  This  capability  has 
not  only  motivated  immense  research  in  the  development  of  nonlinear  pro¬ 
gramming  methods,  but  also  expanded  its  apolications  to  problems  in 
optimal  control ,  optimal  design,  nonlinear  networks,  chemical  processing, 
refinery  operations  and  water  resources  management. 

The  study  of  nonlinear  programming  methods  is  an  area  of  prime  interest. 
This  research  concerns  itself  with  the  development  of  nonlinear  programming 
methods  based  on  quadratic  approximation  of  the  objective  function  and 
linearization  of  the  constraints. 

A  nonlinear  programming  pre.-.em  can  be  stated  as  follows: 

minimize  f(x) 
subject  to  xeS 

where  f  is  a  function  defined  on  En,  S  is  a  subset  of  En,  and  x  is  an 
n-dimensional  vector.  The  function  f  and  the  set  S  are  usually  called  the 
objective  function  and  the  feasible  region,  respectively.  A  decision 
vector  x  is  called  a  feasible  solution  if  xeS.  The  nonlinear  program  aims 
at  finding  a  feasible  solution  x  such  that  f(x)  _>  f(x)  for  each  feasible 
point  x.  Such  a  point  x  is  called  an  optimal  solution  to  the  problem. 
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The  set  S  can  be  defined  in  terms  of  inequality  and  equality  restric¬ 
tions  leading  to  the  following  qeneral  constrained  nonlinear  program: 

P:  minimize  f(x) 

subject  to  g^x)  <  0,  i  = 

hi (x)  =  0 ,  i  =  1 » •  •  • » £ 

Each  of  the  constraints  g-(x)  £0  for  i  =  1 , . . .  »m  is  called  an  ijTe^uaJjrt^ 
constraint  and  each  of  the  constraints  h.(x)  =  0  for  i  =  l,..., I  is  called 
an  equality  constraint.  Most  practical  nonlinear  programming  problems  have 
the  above  form,  and  this  research  concerns  itself  with  quadratic  approxima¬ 
tion  methods  for  solving  this  general  constrained  problem. 
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1.2  Quadratic  Approximation  Methods 

In  this  section,  we  will  briefly  discuss  the  published  literature  on 
quadratic  approximation  methods,  commonly  known  as  quasi -Newton  or  Newton- 
type  methods.  The  basis  of  these  methods  is  to  successively  form  a  quad¬ 
ratic  subprogram  by  linearizing  the  original  nonlinear  constraints  around 
a  given  point  and  replacing  the  objective  function  with  a  suitable  quadratic 
form.  The  optimal  solution  to  the  quadratic  subprogram  is  used  to  update 
the  current  solution  to  the  original  problem. 

This  class  of  methods  was  originally  proposed  by  Wilson  [1963]  anc 
further  extended  by  several  authors  including  Garcia  and  Mangasarian  [1976], 
Han  [1976,  1977]  and  Powell  [1978].  Perhaps  the  most  important  property 
which  is  shared  by  these  algorithms  is  the  fact  that  tney  enjoy  a  super- 
linear  rate  of  convergence  in  the  vacinity  of  Kuhn-Tucker  points  that 
satisfy  second  order  optimality  conditions.  In  [1977],  Han  was  able  to 
show  that  the  optimal  solution  to  the  quadratic  problem  is  indeed  a  descent 
direction  to  a  suitable  penalty  function.  Through  the  use  of  a  line  search, 
he  showed  convergence  of  the  sequence  of  iterates  even  if  the  starting  solu¬ 
tion  is  remote  from  a  Kuhn-Tucker  point,  thus  establishing  global  convergence. 

1.2-1  General  Description  of  the  Algorithm 

In  this  section,  we  will  provide  a  general  description  of  the  quadratic 
approximation  algorithm  for  solving  a  general  constrained  nonlinear  pro¬ 
gramming  problem  of  the  form 

P:  minimize  f(x) 

subject  to  g.j(x)  <_  0,  i  =  l,...,m 
h^x)  =  0,  i  =  1,.. .  ,£ 
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Each  iteration  consists  of  two  major  steps,  namely,  a  direction  findinq  step 
and  a  line  search  step.  In  the  direction  finding  step,  a  quadratic  programming 
subproblem  is  first  formed.  The  solution  to  this  quadratic  program  yields 
a  search  direction.  Once  the  direction  is  determined,  a  line  search  is 
performed  to  produce  a  new  point. 

Suppose  that  at  iteration  k,  the  vectors  xkcE",  ukeEl71,  vkeE^  and  an 
nxn  matrix  are  given.  The  following  steps  are  successively  performed. 
Direction  finding  step 

k 

A  quadratic  subprogram  0(x  .B^}  is  formulated  as  follows: 

Q(xk,B^):  minimize  VfCx^d  +  ^  d^d 

subject  to  g.j(xk)  +  7gi-(xk)td  £  0,  i  =  l,...,m 
hi(xk)  +  Vhi(xk)td  =0,  i 

Note  that  the  original  nonlinear  constraints  are  linearized  arouna  the  point 
k  k  k 

x  .  Let  d  be  a  solution  of  Q(x  ,B^).  This  vector  will  be  called  a  search 

direction  or  simply  a  direction.  The  dual  vectors  p  and  o  are  the  Lagranqian 

multipliers  associated  with  the  linear  inequality  and  equality  constraints 

respectively,  and  will  be  used  to  update  the  Lagrangian  multipliers  of  the 

original  problem  P.  Note  that  the  construction  of  the  constraints  forces 
k 

the  direction  d  to  point  towards  the  feasible  region.  Particularly,  if 
k 

g.j(x  )  >  0,  that  is,  if  the  ith  inequality  constraint  is  violated,  then  the 

*  •  k  t  k 

ith  constraint  of  the  quadratic  program  will  guarantee  that  Vg^(x  )  d 

k  k 

£  -g^(x  }  <  0.  Therefore,  moving  along  d  will  reduce  the  infeasibility 
of  the  ith  constraint  of  the  original  problem.  Similar  interpretation  can 
be  given  for  equality  constraints. 
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Line  search  step 

k 

Using  a  suitable  descent  function  <j>,  once  the  direction  d  is  deter¬ 
mined,  a  line  search  along  it  is  performed,  resulting  in  a  steosize  and 
a  new  point  xk+^  =  xk  +  X^dk  such  that  <]>(xk+^)  <  g(xk).  In  the  vicinity 
of  a  Kuhn-Tucker  solution,  as  will  be  discussed  later,  superlinear  conver¬ 
gence  is  attained  by  simply  letting  A^  =  1.  For  the  purpose  of  the  next 

u  l]  is+'i  k  k  •  — 

iteration,  u  and  v  are  replaced  with  p  and  q  respectively,  ihese 

vectors  can  also  be  used  to  form  the  matrix  ,  as  will  be  discussed 
later. 

The  algorithm  starts  with  a  point  x^ ,  which  is  not  required  to  be 
feasible.  Under  certain  assumptions,  the  algorithm  terminates  at  a  Kuhn- 
Tucker  point  in  a  finite  number  of  iterations  or  else  generates  an  infinite 
sequence  {xk},  any  accumulation  point  of  which  is  a  Kuhn-Tucker  point.  We 
note  that  the  generated  sequence  |xk}  may  not  be  feasible,  thus  deviating 
from  conventional  feasitle  direction  methods  as  in  the  works  of  Zantendijk 
[I960]  and  Topkis-Veinott  [1967], 

We  note  that  a  linearly  constrained  subprogram  can  be  used  in  place  of 

the  quadratic  subprogram.  The  solution  to  the  linearly  constrained  problem 

k  "f  1 

is  used  as  the  next  iterate  point  x  .  We  briefly  discuss  below  the  linear 
consirained  programs  proposed  by  Rosen  and  Kreuser  [1972]  and  Robinson  [1972] 
Rosen  and  Kreuser's  subprogram  is  as  follows: 

m  k  Z  k 

minimize  f(x)  +  T  u.g.(x)  +  V  v.h.(x) 

1=1  11  i=l  1  1 

subject  to  (xk)  +  Vg,j(xk)^(x-xk)  £  0,  i  =  1 , . . .  ,m 
gi(xk)  +  Vhi.(xk)t(x-xk)  =  0,  i  =  !,...,£ 

The  objective  function  is  the  Lagranciian  function  for  problem  P,  and  the  con- 
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straints  are  linear  approximations  to  the  original  constraints. 

Robinson  used  a  slightly  different  objective  function  of  the  form: 

f(x)  +  J  uj[g.(x)  -  gi(xk)  -  Vg; (xk)t(x-xk) ] 

+  l  v*f[h.(x)  -  h.(xk)  -  Vh.(xk)t(x-xk)] 
i=l  11  1  1 

The  main  difference  is  that  linear  approximations  to  the  original  constraints 
are  subtracted  from  the  Lagrangian  objective  function.  When  the  original 
problem  is  linearly  constrained,  the  objective  function  proposed  by  Robinson 
is  equivalent  to  the  original  criterion  function.  This  is  not  the  case  for 
the  method  of  Rosen  and  Kreuser  unless,  of  course,  =  0  and  v^  =  0. 

Line  search  is  usually  used  to  control  the  convergence  of  the  generated 
sequence  {x  }.  However,  if  the  point  x  is  sufficiently  close  to  a  solution 
point  x,  the  new  point  x  =  x*  +  d  satisfies  ||  x  -x  ||  <  ||  x  -x  ||  ,  so 
that  the  distance  function  from  x  can  itself  be  used  as  a  descent  function, 
wonre  rhp  steo  size  rule  =  1  is  useful  in  the  vicinity  of  a  solution 
point.  This  rale  has  been  used  by  Wilson  [1963],  Rosen  and  Kreuser  [1971], 
Robinson  [1972],  Garcia  and  Mangasarian  [1976],  Han  [1976,  1977],  and  Powell 
[1978].  if  a  starting  point  is  far  from  a  solution,  the  use  of  line  search 
is  necessary  to  achieve  global  convergence. 

Han  [1977],  and  Bazaraa  and  Goode  [1979]  used  line  search  in  the  context 
of  quadratic  approximation  methods  in  order  to  maintain  the  monotonic  decrease 
of  an  exact  penalty  function. 

We  note  that  the  algorithm  under  study  can  be  thought  of  as  an  extension 
of  a  certain  class  of  descent  algorithms  for  unconstrained  optimization. 
Particularly  in  the  absence  of  constraints,  and  by  choosing  the  descent  func¬ 
tion  to  be  the  objective  function  itself,  various  choices  of  E  lead  to 

K 


7 


distinct  methods.  If  B ^  =  I,  the  alaorithm  is  the  method  of  the  steepest 
descent.  When  the  matrix  is  taken  as  the  Hessian  of  the  objective  func¬ 
tion,  the  algorithm  reduces  to  Newton's  method.  If  updating  schemes  are 
used  to  approximate  the  Hessian  of  the-  objective  function,  then  the  algorithm 
turns  out  to  be  a  quasi-Newton  method. 

1.2-2  The  Quadratic  Programming  Subproblem 

In  this  section,  we  will  discuss  various  methods  proposed  for  forming 

the  quadratic  programming  direction  finding  problem.  The  linearization  of 

all  constraints  is  the  common  property  of  these  methods.  However,  various 

objective  functions  for  the  ouadratic  program  have  been  proposed  by  several 

authors.  Particularly  the  quadratic  objective  function  at  iteration  k  is 
k  t  It 

given  by  7f(x  )  d  +  ~  d  B^d,  where  approximates  the  Hessian  of  the 
objective  function  or  the  Lagrangian  function 

m  l 

L(x,u,v)  =  f ( x )  +  l  u-g-(x)  +  l  v  h  (x) 

i=l  1  1  i=l  1  1 

In  this  section,  we  will  discuss  some  methods  for  computing  and  uodatinq 
the  matrix  B^.  These  include  exact  computation,  finite  difference  approx¬ 
imation,  and  the  use  of  quasi-Newton  updates  for  the  Hessian  of  the  Lagrangian 
function  or  the  original  objective  function.  Other  choices  of  interest  are 
identity  and  diagonal  matrices. 

Exact  Computation  of  the  Hessian 

2  k 

The  matrix  is  taken  as  the  Hessian  of  the  objective  function  7  f(x  j 

k  k  k 

or  the  Hessian  of  the  Lagrangian  7  L(x  ,u  ,v ')  aiven  by: 

A  A 

V  L(xk,uk,vk)  =  V2f(xk)  +  l  ukV2g.(xk)  +  l  vk72h-(xk). 
xx  i=]  '  i=l  1  1 
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In  [1q6'j,  Wilson  used  the  Hessian  of  the  Laqranqian  function  and  was 
<■  j . e  to  show  superlinear  convergence  of  the  algorithm.  One  disadvantage 

caused  by  this  choice,  however,  is  the  requirement  that  the  Hessian  be 

n2 

determined  at  each  iteration  k.  This  involves  the  evaluation  of  (l+m+£) 
scalar  functions  even  if  all  gradient  vectors  are  given.  For  most  func¬ 
tions  this  operation  is  very  costly.  If  the  Hessian  7  L(x,u,v)  is  relatively 

X  X 

easy  to  obtain  and  is  positive  definite,  then  this  approach  may  prove  attrac¬ 
tive.  Keeping  in  mind  the  difficulties  associated  with  solving  a  nonccnvex 
quadratic  program,  several  methods  have  been  proposed  to  maintain  positive 
definiteness  of  B,  even  if  the  Hessian  7  L(x  ,u  ,v  )  were  not.  In  [1967], 

*  A  A 

Greenstadt  suggested 


=  7  3-b.b 

i=l  1  1  1 


where  =  max  { ;  a  ^  •  ,  5 } ,  5  is  a  positive  scalar,  a.  is  the  i th  eigenvalue  of 

k  k  k  , 

?xxL(x  ,u  ,v  )  and  b^  is  its  correspond! ng  eigenvector  with  ijb^  ;j  =  1 .  The 

method  of  Levenberg-Marquardt  is  to  let 


Bk  •  VxxL(xk,uk,vH 


S! 


where  3  is  a  positive  scalar  large  enough  to  assure  that  is  Dositive  defi¬ 
nite.  Ore  particular  implementation  of  this  scheme  is  to  attempt  to  use 

k  k  k  t 

Cholesky's  factorization  of  7^i(x  ,u  ,v  )  into  the  form  LDL  ,  where  L  is  a 

lower  triangular  matrix  with,  ones  on  the  diagonal  and  D  is  a  diagonal  matrix 

k  k  k 

with  positive  diagonal  elements.  If  v  L ( x  ,u'  ,v  )  is  not  positive  definite, 
the  factorization  would  fail,  but  as  described  in  Gill  and  Murray  [1972],  a 
factoriza tion  of  a  modified  matrix  will  be  at  hand.  For  other  methods, 
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see  Goldfeld,  Quandt,  Trotter  [1966],  Fiacco  and  McCormick  [1968],  Gill  and 
Murray  [1972],  Mathews  and  Davies  [1971],  Fletcher  and  Freeman  [1977], 

Finite  Difference  Approximation  of  the  Hessian 

2 

If  obtaining  the  Hessian  7  L(x,u,v)  or  7  f(x)  is  relatively  difficult, 

X  A 

a  finite  difference  approximation  to  the  Hessian  can  be  used.  This  is  done 
as  follows: 


“Vij 


VxL(xk+he  .,uk,vk)i  -  VxL(xk,uk,vk)i 


,i , j  =  1 , . . .  ,n 


where  h  is  a  suitably  chosen  scalar,  and  ej  denotes  a  unit  vector  whose  jth 
entry  is  one. 

There  is  a  significant  amount  of  theoretical  and  computational  support 

for  this  approximation.  For  example,  see  Goldstein  [1965],  Stewart  [1967] 

and  Goldstein  and  Price  [1967]  and  Dennis  [1972].  The  expense  of  computino 
n2 

o-  (l+m+£)  scalar  functions  still  remains  and  positive  definiteness  of  B,  is 

K 

not  guaranteed. 

A  technique  to  reduce  the  overall  computational  effort  is  to  hold  the 
matrix  8^  fixed  for  a  certain  number  of  iterations.  This  is  Dractically  useful 
when  the  change  of  the  Hessian  is  not  significant.  However,  it  is  difficult 
to  decide  how  long  the  matrix  should  be  held  fixed.  For  details  of  this 
technique,  see  Brent  [1973]. 

Quasi-Newton  Updates 

To  avoid  calculating  second  derivatives,  quasi-Newton  updates  have  been 
investigated  by  several  authors.  The  basic  scheme  is  of  the  form: 


Bk+i  -  Bk  +  °k 


Here  is  called  a  correction  matrix  and  is  chosen  to  assure  that  B^+-j 
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satisfies  the  quasi -Newton  equation: 


Bk+lSk  yk 


,  k+l  k  ,  ,  /  k+l  k+l  k+1 v  ,  ,  k  k+l  k+l. 

where  s^  =  x  -  x  anc,  y^  =  V^L(x  ,u  ,v  )  -  7^L(x  ,u  ,v  ).  First, 

we  discuss  updates  for  de. -2  and  symmetric  Hessian  matrices.  Later,  we  will 


discuss  updates  for  the  sparse  case. 


Garcia  and  Mangasarian  [1976] 

Garcia  and  Mangasarian  proposed  a  suitable  update  similar  to  those  used 
in  quasi-Newton  methods  for  unconstrained  optimization.  They  used  an  updatinq 
mechanism  for  an  (n+m+£)  x  (n+m+£)  matrix  which  approximates  the  Hessian  of 
the  Lagrangian.  The  upper  left  n  x  n  submatrix  is  used  as  the  quadratic 
form  in  the  direction  finding  problem.  To  be  specific,  the  updating  scheme 
is  given  below: 

eVy 

Hk+1  =  Hk  +  ~E  ^ykskCk+Ckskyk^  "  TTT~^T2  CkskskCk 
skCksk  (skCksk} 


where 


sk 


k  ,  k  k  k, 
z  =  (x  ,u  ,v  ) 


yk  »  vzL(2k+l)  -  vzl(zk)  -  Hks, 


9e(0,l) 


if  k+l  =  0  mod  (n+m+£) 


Ttrr -  CkskskCk  otherwise 
skLksk 


The  initial  matrices  and  C-j  are  equal  to  the  (i+m+£)  x  (n+m+£)  identity 
matrices.  Since  is  the  upper  left  n  x  n  submatrix  of  Hk ,  the  scheme  seems 
to  be  wasteful  especially  if  the  number  of  constraints  is  very  large.  Further¬ 
more,  it  does  not  guarantee  that  the  matrix  Bk  is  positive  semi -definite . 


Han  [19761 


As  opposed  to  updating  the  overall  Hessian  of  the  Lagrangian,  Han  pro¬ 
posed  updating  the  Hessian  ^xxL(xk,uk,vk)  only  with  respect  to  the  vector  x. 
The  updates  are  extensions  of  some  well  known  double  rank  updates  for  uncon¬ 
strained  optimization  problems.  The  general  formula  is  given  below: 


-  Bk 


.  (vBksk)ck  *  ck(vBksk)t  skWk>Vk 


<<V2 


Wherp  s  =  yk+1  Yk  w  _  o  I  fYk+l  k+1  k+1,  _  .  ,  k  k+1  k+1 .  , 

wnere  :>k  -  x  -  x  ,  -  v^Hx  ,u  ,v  )  -  VxL(x  ,u  ,v  ),  and  Cj 

is  any  vector  with  f  0.  Even  though  the  above  formula  updates  the 

Hessian  of  the  Lagrangian  only  with  respect  to  the  x  vector,  it  has  th > 

disadvantage  that  it  does  not  preserve  positive  definiteness. 


Powell  [19781 


Powell  presented  a  quasi-Newton  update  which  preserves  positive  definite¬ 
ness  of  the  matrix  B.  even  if  the  Hessian  V  L(x,u,v)  is  itself  not  positive 
definite.  Powell's  update  can  be  thought  of  as  an  extension  of  the  well 
known  BFGS  formula  given  below. 


8k+l  =  Bk 


Bks  ks  kBk  ^k^k 
skBksk  skyk 


where  sk  -  xk+1  -  xk  and  yk  »  7xL(xk+1 ,uk+1 ,vk+1 )  -  7  L(xk,uk+1  ,vk+1 ) .  if 


\z 


the  matrix  is  positive  definite,  then  the  matrix  B^+-j  is  also  positive 
definite  provided  that  >  0  holds.  However,  Powell  pointed  out  that 
s£yk  >  0  may  not  be  satisfied  due  to  the  negative  curvature  of  the  Lagrangian 
function.  Rather  than  using  in  the  third  term  of  the  BFGS  formula, 

Powell  used  the  vector  which  is  a  convex  combination  of  yk  and  B^. 

The  convex  combination  is  chosen  so  that  s^  >  0  holds  in  all  cases,  thus 


ma 


intaining  positive  definiteness  of  Bk+-j .  This  update  is  given  below. 


.  wX  .  Vk 

Bk+1  =  Bk  '  t~—~  7T~ 


s"B.  s. 
k  k  k 


% 


where  =  0  yk  +  (1-6)  B^,  and 


6  = 


1  it!ki°-!sE¥k 


°.2  skuBksk 

S 5b,  s,  -s}y. 
k  k  k  kJk 


otherwise 


Sparse  and  Symmetric  Updates 

For  sparse  problems,  the  quasi -Newton  updates  discussed  so  far  have 

n2 

several  drawbacks.  First,  because  of  symmetry,  memory  locations  are 
needed,  which  becomes  impractical  as  n  increases.  Second,  zero  elements  in 
the  Hessian  of  the  Lagrangian  will  be  approximated  by  generally  nonzero 
elements  resulting  from  the  updating  formula.  Finally,  the  update  formulae 
may  waste  a  substantial  computational  effort  in  carrying  out  unnecessary 
matrix  and  vector  multiplications.  Here  we  discuss  sparse  and  symmetric 
updates  where  the  Hessian  VxxL(x,u,v)  of  the  Lagrangian  function  or  the 
HessU-i  of  the  objective  function  has  a  known  sparsity  pattern. 

let  J  be  the  set  of  indices  denoting  the  positions  of  the  known  zero 


entries  of  the  Hessian  and  let  K  be  the  set  of  all  indicies  not  in  J. 


13 


In  [1977,  1978],  Toint  proposed  the  sparse  and  symmetric  update  qiven  as 
follows: 

First  the  vector  i  =  is  defined  as  follows: 

s-  if  (Uj)eK 

0  otherwise 

An  n  x  n  matrix  0  is  formed  using  the  vectors  t-'s  as  follows,  where  5..  is 

*  '  J 

the  Kronecker  delta. 


T  .  .T  .  . 

1J 


1 , . . .  ,n ,  j  1 , . . .  ,n 


Note  that  $  satisfies  the  sparsity  conditions,  and  is  symmetric  and  positive 
definite  provided  that  none  of  the  vector  t^.,  i  =  l,...,k  is  identically  zero. 
Then 

0  if  (i,j)eJ 

B^s  j  +  6jS  ^  +  (Bk) ^ j  otherwise 

where  the  vector  6  is 

B  =  $"1(yk-BkSk). 

Note  that  the  above  update  satisfies  the  Quasi-Newton  eauation.  See  Schubert 
[1970]  for  an  update  of  the  Jacobian  matrix  for  nonlinear  systems  of  equations. 
The  interested  reader  may  refer  to  Goldfarb  [1970]  for  an  update  based  on 
the  Cholesky  decomposition,  Marwil  [1978]  and  Shanno  [1980]  for  an  update 
based  on  Greenstadt's  [1970]  variational  method. 
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Special  Choices  of 

Here  we  will  consider  two  special  choices  of  B^.  When  is  chosen  to 
be  the  identity  matrix,  the  subprogram  Q(x  ,B^)  is  equivalent  to  the  problem 
of  finding  the  least  distance  from  the  point  -  Vf(x  )  to  the  feasible  region 
of  the  direction-finding  problem.  Several  authors  have  provided  efficient 
methods  to  handle  this  special  problem.  For  example,  see  the  survey  paper 
by  Cottle  and  Djang  [1979].  Here  we  may  expect  that  the  direction  d  pro- 
duced  by  Q(x  ,1)  would  be  inferior  to  the  direction  produced  by  0(x  ,3,  ) 

K 

-  k 

around  the  so1ution  x.  However,  the  subprogram  0(x  ,1)  has  some  advantages. 
One  principal  advantage  is  that  this  program  is  usually  much  easier  to  solve 
than  Q(x  ,B. ).  Another  factor  is  the  fact  the  program  0(x  ,B^)  yields  super- 
linear  convergence  only  in  the  vicinity  of  a  solution  point  x,  but  actually 
has  no  theoretical  advantage  in  early  stages  of  the  optimization  process. 

The  use  of  the  program  Q(x  ,1)  can  be  interpreted  as  an  extension  of  the 
steepest  descent  method  for  unconstrained  optimization. 

Another  choice  is  that  each  is  taken  as  a  diagonal  matrix  whose 
diagonal  entry  approximates  the  Hessian  of  the  Lagrangian  function  or  the 
objective  function  by  finite  difference  methods.  To  be  specific,  let 

(  V  l(xk+he, ,uk,vk) .  -  V  L(xk,uk,vk) . 

(Vii  ■  '• - — 2 — - 1 


where  h  is  a  suitably  chosen  positive  number  and  e-  denotes  an  n-dimensional 
unit  vector  whose  ith  entry  is  one.  We  note  that  the  (l+m+£)  gradient  vectors 
are  evaluated  to  produce  the  diagonal  matrix  at  each  iteration.  Note  that 
the  matrix  is  positive  definite,  and  J (J  and  || B^""*  |(  are  both  bounded 

if  the  gradient  vectors  are  bounded.  Other  choices  for  the  diagonal  matrix 
q  wi i i  he  i nves tiaa ted . 
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We  assume  that  S  is  nonempty  and  that  g.j(*)>  i  =  l»...»m  ana  h^. (•)»  i  = 

^  k 

are  continuously  differentiable.  Let  S(x  )  denote  the  linearization  of  the 

k 

set  S  at  the  point  x  so  that 


S(xk)  =  {x|  g ( x k )  +  Vg^. (xk)t(x-xl<)  £  0,  i  =  1,...,m, 
hi (xk)  +  7hi (xk)t(x-xk)  =  0,  i  = 

Note  that  the  feasible  region  of  the  quadratic  program  Q(x  ,8^)  is  nonempty 
only  if  §(xk)  is  nonempty.  If  the  latter  is  empty,  then  the  quadratic  program 
is  inconsistent  and  the  quadratic  approximation  algorithm  will  stop  prematurely 
This  point  is  illustrated  by  the  following  example. 

Example  1 :  minimize  x^  +  x2 

2  2 

subject  to  h-j(x)  =  x-j  +  x2  -  2  =  0 


Note  that  the  feasible  region  of  the  problem  is  nonempty  and  that  the  optimal 
solution  x  is  (-1,  -l)t.  Let  Bk  =  I  and  consider  the  quadratic  subprogram 
at  the  point  xk  =  (0,0)^  given  below: 

1  2  2 

minimize  (d^+d2)  +  ^  (d-|+d2) 
subject  to  -2=0 
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Clearly  this  problem  is  inconsistent  and  would  result  in  premature  termina¬ 
tion  of  the  algorithm. 

In  the  vicinity  of  a  Kuhn-Tucker  point  satisfying  the  second  order 

~  k 

sufficiency  optimality  conditions,  the  region  S(x  )  is  nonempty.  If  the 
point  x  is  feasible,  the  region  S(x  )  is  indeed  nonempty  because  d  =  0  is 
feasible.  However,  if  the  point  x^  is  infeasible  and  remote  from  the  solu- 
tion  point,  we  must  provide  a  resolution  to  the  case  where  the  region  S(x  ) 
is  empty.  Han  [1977]  provided  a  sufficient  condition  to  assure  that  the 
region  $(x  )  is  nonempty.  The  result  is  summarized  in  the  following  lemma. 

Lemma  1 

Let  g.,  i  =  l,...,m  be  continuously  differentiable  and  convex,  and 
h-,  i  =  be  affine.  If  the  set  {x!  g^x)  <  0,  h^x)  =  0,  i  =  l,...,m, 

i  =  is  nonempty,  then  S(x  )  is  nonempty  for  any  x  eE  . 

Clearly,  this  sufficient  condition  is  very  restrictive.  Bazaraa  and 
Goode  [1979]  introduced  artificial  variables  to  prevent  the  constraint  set  from 
being  empty.  Through  the  use  of  a  penalty  term,  these  artificial  variables 
will  be  equal  to  zero,  unless  of  course  the  region  S(x  )  is  itself  empty. 

This  quadratic  program  is  given  below: 

D(xk,Bk):  minimize  Vf(x*<)td  +  1  d^B^d  +  r 

subject  to  q-(^)  +  Vg^x’Vd  <  yi ,  i  =  1 , . . .  ,m 

hjtxh  +  Vhj(xk)td  =  -  zT,  i  =  I . t 


>  0 

*  i  1 , .  . .  >  rn 

+ 

i  =  I,...,* 

zi 

>  z7  >_  0 
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where  r  is  a  sufficiently  large  positive  number.  The  introduction  of  the 

artificial  variables  y. ,  z~.  and  z*  assures  that  the  feasible  region  of 
k 

D(x  ,Bk)  is  maintained  nonempty.  However,  we  will  show  through  a  simple 
example  that  quasi-Newton  updates  of  are  inadequate  in  this  case  unless 
some  additional  considerations  are  taken  into  account. 

Example  2:  We  will  reconsider  Example  1. 


minimize  x^  +  x^ 

2  2 

subject  to  +  x2  “  ^  =  0 
xeX2 

k  t 

Let  the  point  x  =  (0,0)  and  =  I.  Then  we  get  the  Quadratic  program 
D(x  B^)  given  below: 

2  2  +  — 
minimize  d-j  +  +  d-j  +  d2  +  r(y  +  y  ) 

subject  to  -  1  -  y +-y" 

y+  >  0,  y"  >  0 


The  optimal  solution  to  the  above  problem  is 


d 


k 


y+  =  o,  y"  = 


2 


The  Lagrangian  multiplier  q  associated  with  the  linear  equality  constraint  is 
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q  =  r 


Note  that  the  Lagrangian  multiplier  u  =  at  the  optimal  solution  x  =  (-1,-1)^ 
If  r  is  sufficiently  large,  the  estimate  q  of  the  Lagrangian  multiplier 
is  unnecessari ly  large.  The  Lagrangian  function  will  thus  be 


L(x,q)  =  Xj  +  x2  +  r(x^+x^- 2) 


which  means  that  a  big  penalty  is  imposed  on  the  constraint  because  it  was 
inconsistent  at  the  point  x  =  (0,0)  .  The  unnecessarily  large  number  q 
may  result  in  ill-conditioning  of  the  next  iterate  like  penalty  function 

methods.  We  note  here  that  the  choice  of  in  Bazaraa  and  Goode  (1979)  does 
not  depend  on  the  estimates  of  the  Lagrangian  multipliers.  V.'hen  an  update 
of  is  applied,  one  approach  is  to  keep  the  values  of  the  Lagrangian  multi¬ 
pliers  corresponding  to  the  inconsistent  constraints  fixed  rather  than 

replacing  them  with  the  Lagrangian  multipliers  produced  by  the  quadratic  sub- 

k  k 

program  D(x  ,B^)*  In  this  study,  we  will  investigate  the  subprogram  D(x  ,8^) 

further . 

Another  approach  is  to  eliminate  some  inconsistent  constraints.  Let 
I(xk)  =  {ij  ||vgi(xk)i!  f  0}  and  J(xk)  =  |i|  ||vh.(xk)  ||  j4  0}.  Then  we  have 
the  following  linear  system  to  represent  the  feasible  region  of  the  quadratic 
subprogram  Q(x  ,8^) 


gi(xk)  +  7gi(xk)td  £  0,  iel(xk) 
h.(xk)  +  Vh.(xK)^d  =  0,  icJ(xk) 


We  will  investigate  some  sufficient  conditions  to  guarantee  that  the  above 
system  is  not  empty. 
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|  °-4  Updating  the  Lagrangian  Multipliers 

In  this  section,  we  will  discuss  updating  the  Laqranqian  multipliers. 
k+1  k+1 

The  estimates  u  and  v  of  the  Lagrangian  multipliers  may  be  used  to 
determine  the  matrix  B^  if  is  chosen  to  approximate  the  Hessian  of 
the  Lagrangian  function.  Here  we  will  discuss  the  updating  scheme  employed 
by  most  authors  and  then  discuss  some  variations  to  be  investigated  further. 
The  most  popular  updating  scheme  is  given  below: 


and 


k+1  k 

u  =  p 

k+1  k 

v  =  q 


k  k 

where  p  and  q  are  the  Lagrangian  multipliers  obtained  from  problem 

k  k  k+1 

Q(x  »Bk).  Note  that  since  p  >_  0,  the  nonnegativity  of  u  is  automatically 

maintained.  This  scheme  has  a  certain  advantage  that  if  the  sequence  { xk } 

-  k  k 

converges  to  a  Kuhn-Tucker  point  x,  the  estimates  u  and  v'  converge  to  the 

vectors  u  and  v  of  the  Lagrangian  multipliers,  respectively.  Under  this 

k  V 

method,  the  dual  solution  (p  ,q')  may  affect  :he  numerical  stability  of  the 

k  k 

matrix  B^.  If  the  length  of  the  vector  (p  ,q  )  is  unnecessarily  large, 

the  next  iterate  may  suffer  from  ill-conditioning.  This  situation  may 

arise  if  Q(x  ,8^)  is  inconsistent  and  if  the  search  direction  is  obtained 

by  solving  D(x  jB^)  as  explained  in  Example  2  in  Section  1.2-3. 

Han  [1977]  presented  a  sufficient  condition  that  the  °°-norm  of  the  dual 
k  k 

solution  (p  ,q  )  is  bounded  by  a  certain  positive  number.  The  result  is 
summarized  in  the  following  theorem. 


Theorem  1 

Let  f  and  ,  i  =  1 , . . . ,m  be  continuously  differentiable,  ,  i  =  1 , . . . ,m 
be  convex,  and  h^ ,  i  =1,.  ,Z  be  affine.  Suppose  that  the  feasible  region  of 
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the  original  problem  P  is  nonempty.  Further,  suppose  that  the  matrix 
satisfies  the  following  condition: 

5-j  ]|d  |f2  <  d^B^d  <_  62 i!  d  !1  ^  for  any  deEk  ,  for  all  k 

_  k  k  v 

Then  there  exists  r  >  0  such  that  if  (p  ,q  )  is  a  dual  solution  to  0(x",3, ) 

k  k  — 

then  the  “-norm  of  the  dual  solution  (p  ,q  )  is  bounded  by  r  for  each  k. 

The  sufficient  condition  seems  restrictive  mainly  because  of  convexity 
of  the  inequality  constraints  and  linearity  of  the  equality  constraints. 
Since  the  number  r  is  unknown  a  priori,  there  still  remains  the  possibility 
of  ill-conditioning  of  the  matrix  if  r  is  sufficiently  large. 

Revising  the  Updating  Scheme 

k  k  k  k 

Let  d  be  a  solution  to  Q(x  ,3^).  Then  the  dual  vector  (p  ,q  )  solves 

the  following  system: 

Vf(x  )  +  B,d  +  l  p.vg .(xK)  +  l  q.Vh.(xK)  =  0 
K  i  =  l  1  1  i=l  1  1 

Pi(9i(xk)  +  Tgi(xk)tdk)  -  0,  i  =  1 . m 

p^  _>  0  i  =  1 , . . .  ,m 

Note  that  the  system  may  not  have  a  unique  solution.  In  particular,  we  are 

k  k 

interested  in  finding  a  solution  (p  ,q  )  with  minimum  “-norm  to  prevent  the 
possibility  of  ill-conditioning  of  the  matrix  Bk+1 '  Furthermore,  we  will 
investigate  other  updating  rules.  One  such  rule  is: 

=  max  {0,  Uj  +  Sgj(xk+,)i 
v$+1  .  vf  ♦  Sh^x"*1) 
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where  6  is  a  suitably  chosen  positive  number.  This  method  can  be  inter¬ 
preted  as  a  subgradient  optimization  scheme  where  a  fixed  step  along  the 
subgradient  (g(x^ )  ,h(x*<+'' ) }  to  the  Lagrangian  function  is  taken,  and 
then  forcing  any  negative  components  of  the  Lagrangian  multipliers  of  the 
inequality  constraints  to  be  equal  to  zero. 

1.2-5  Local  Convergence 

One  of  the  key  advantages  of  quadratic  approximation  methods  is  the 
fact  that  they  enjoy  a  superlinear  rate  of  convergence  in  the  vicinity  of 
a  Kuhn-Tucker  point  satisfying  second  order  sufficiency  conditions.  In  this 
section,  we  will  discuss  the  major  results  and  assumptions  which  guarantee 
superlinear  convergence. 

First,  we  review  the  second  order  sufficiency  condition  which  was  first 
studied  by  Fiacco  and  McCormick  [1968]. 

Defini tion 

A  Kuhn-Tucker  triple  (x,u,v)  of  problem  P  satisfies  the  second  order 
sufficiency  conditions  if  the  following  conditions  are  simultaneously  satisfied 

(i)  ui  0  if  icl(x),  where  I(x)  =  (jj  g  ( x )  =  0}. 

(ii)  The  set  N,  the  collection  of  the  gradient  vectors  Vq^(x),  iel(x) 
and  7h .  (x) ,  i  =  1 is  linearly  independent. 

(iii)  The  Hessian  V  1(5)  is  positive  definite  on  the  tangent  subspace 
T  =  {yj  ytd  =  0,  deN}. 

Local  convergence  can  be  established  through  the  use  of  a  contraction 
mapping  defined  on  a  sufficiently  small  ball  B  (5)  =  (z!  i|  z-zj  £_  e}  such 

that 
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where  z  denotes  a  Kuhn-Tucker  triple  satisfying  the  second  order  sufficiency 
conditions.  The  following  theorem  summarizes  the  main  local  converqence 
result  of  the  algorithm. 

Theorem  2 

Let  z  =  (x,u,v)  be  a  Kuhn-Tucker  triple  of  problem  P.  Suppose  that  z 
satisfies  the  second  order  sufficiency  condition,  and  that  f,  , ( i=l , . . . ,m) , 
h- , ( i=l , . . . ,1)  have  a  second  derivative  which  is  Tinschitz  continuous  at  the 
point  x.  Then  for  rs(Q,1),  there  exist  oositive  numbers  e  and  5  such  that 
if  !|  zk-zij  <  €  and  B.-7  L(z)!j  <  £  at  the  point  zk  =  (xk,uk,vk),  there 

K  X  X 

l,  k  + 1  l  + 1  i/  k  •' 

exists  a  closest  solution  (d  ,u  ,v  ')  of  D(x  ,8^)  to  (0,u  ,v  )  such  that 

(|  zk+1  -2  !  <  r  zk-z!| 
where  zk+^  =  (x*>ak,uk+^  ,vk+^ ) . 

Proof 

See  Han  [1976] . 

k 

We  note  that  the  theorem  holds  only  when  z  and  are  sufficiently 

k+1  -  -  i  k  - 

close  to  z  and  7  L(z),  respectively.  Obviously,  since  ']  z  -z;j  £  r sj  z  -z|i  , 

A  A 

the  convergence  is  guaranteed.  However,  as  we  will  discuss  later  in  the 
section,  a  fast  rate  of  convergence  characterized  by  superlinear  convergence, 
is  actually  real ized. 

For  the  discussion  of  the  superlinear  convergence,  we  present  the  following 
definitions  of  linear  and  superlinear  convergence. 
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k  _  '  k  i 

Let  {2  }  converge  to  2.  Then  the  sequence  {2';  is  said  to  converge 
1  i nearly  if  there  exists  an  rc(0,l)  and  k^  >_  0  such  that 


jj  zk+  ^  —  2 1 ]  £  rjj  zk-zjj  for  all  k  2*  k^ 


If  there  exists  a  sequence  {y^}  convergent  to  zero  such  that 


m  k+1  “ii  m  k  - 1. 

I!  2  -z!j  <  Yk|!  Z  -2  jj 


r  K  ' 

then  the  seGuence  { z  }  is  said  to  converge  superl inearl v.  If  { z ^ }  converges 
superl inearly  to  z,  then 


!!  zk+1-zkj! 

lim  .  =1 

k  -  »  ||  z  -z  j) 

k  — 

provided  that  z  ?  z.  However,  the  converse  is  not  true.  For  more  details 
on  superl inear  convergence  properties,  refer  to  Dennis  and  More  [197^,  1977] 
and  Ortega  and  Reinboldt  [1970]. 

To  obtain  the  linear  and  superlinear  rate  of  convergence,  several  suffi¬ 
cient  conditions  have  been  provided.  The  conditions  are  mainly  based  on  the 
absolute  and  relative  error  of  approximations  to  the  Hessian,  measured  by  sore 
fixed  matrix  norms.  A  sufficient  condition  for  the  linear  rate  of  convergence 
is  that  ||  B.-V  L(z)||  <  6.  Here  jj  •  1]  denote  any  fixed  matrix  norm  and  5  is 
a  sufficiently  small  positive  number.  The  interested  reader  may  refer  to 
Garcia  and  Mangasarian  [1976),  and  Han  [1976].  A  sufficient  condition  for  the 
superlinear  rate  of  convergence  is  that 


(ek-"xxL(  z) )  (xk+1  -xk )  i! 

li  *k+1-zkii 
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This  condition  is  credited  to  Han  [1976].  For  similar  conditions,  refer  to 

Garcia  and  Mangasarian,  and  Towel!  [1978].  We  note  that  if  !i  B^-?xxL(z)  |[ 

k 

converges  to  zero,  then  the  sequence  {z  }  converges  superl i nearly  to  z. 

The  reader  may  easily  note  that  the  methods  of  Wilson  [1963],  Robinson  [1972] 
and  the  finite  difference  procedure  are  superl inearly  convergent  because 
lim  j]  B.-7  L(z)|j  =  0.  However,  the  condition  lim  |j  B.-V  L(z)|i  =  0  is  not 

k*  K  XX  i  K  XX 

necessary  for  superlinear  convergence. 


1.2-6.  Global  Convergence 

In  this  section,  we  will  discuss  global  convergence  of  quadratic  approx¬ 
imation  algorithms  employing  line  search.  As  mentioned  before,  in  the  vicinity 
of  a  Kuhn-Tucker  point  which  satisfies  the  second  order  sufficiency  condition,  tl 
distance  function  from  the  Kuhn-Tucker  point  can  be  used  as  a  descent  function, 
thus  establishing  convergence.  If  a  starting  point  is  remote  from  the  Kuhn- 
Tucker  point,  a  line  search  scheme  employing  a  suitable  descent  function  is 
needed  to  achieve  convergence.  The  choice  of  descent  functions  and  their 
convergence  results  will  be  discussed  in  this  section. 

An  Exact  Penalty  Function 

A  successful  descent  function  is  the  penalty  function  Tr(x)  of  the  form 


$r(x) 


f(x)  +  r 


m  £ 

J  max{C,g • (x) }  +  l 
i=l  i-1 


(hi (x) 


The  parameter  r  will  be  called  an  exact  oenalty  parameter.  The  function  was 
first  used  as  a  descent  function  in  the  context  of  quadratic  approximation 
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methods  by  Han  [1977],  In  [1979],  Bazaraa  and  Goode  simplified  their  minimax 
algorithm  to  directly  handle  the  penalty  function  problem  to  minimize  $  (x). 
The  algorithms  of  Han  and  Bazaraa  and  Goode  are  discussed  below.  Both  algo¬ 
rithms  are  globally  convergent  in  the  sense  that  each  accumulation  point  of 
k 

the  sequence  {x  }  is  a  Kuhn-Tucker  point.  Both  algorithms  have  the  form 
k+1  k  k  k 

x  =  x  +  X^d  ,  where  d  is  obtained  from  solving  a  quadratic  program  and 

X^  is  obtained  by  a  suitable  line  search  scheme.  Han  [1977]  showed  that  the 

k  k 

direction  d  obtained  from  the  quadratic  programming  problem  Q(x  ,Bk)  is 

indeed  a  descent  direction  for  the  exact  penalty  function.  The  line  search 

1/ 

along  the  direction  d  is  performed  as  follows: 

/  k+1 »  .  /  k, ,  ,kx 

<p  (x-  )  <  mm  <|>  (x  +Xd  )  +  e, 

0<X<o  r  K 

where  5  is  a  prescribed  positive  number  and  is  an  error  term  allowed  for 
the  line  search  such  that 


We  note  that  since  the  function  $  (x)  is  nondifferentiable,  derivative-based 
search  methods  cannot  be  applied  directly. 

Bazaraa  and  Goode  [1979] 

Their  algorithm  was  originally  designed  to  solve  minimax  problems.  Hence 
the  algorithm  can  be  specialized  to  solve  the  exact  penalty  function.  The 
corresponding  quadratic  suborogram  D(x  ,B^)  is  of  the  form 
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0(x'k,B^):  minimize  Vf(x'<)td  +  r 


m  l 

l  Vi  +  I  (zi+zi^ 
1=1  1  i=l  1  1 


+  1  dtB,  d 

2  k 


subject  to  g.j(xk)  +  vg.(xk)td  <_y.,  i  =  1,..., 


m 


h-(xk)  +  Vh^x^d  =  z|  -  zT,  i  = 


yi  >  0,  i  =  1 » •  • • »m 
z+.,  zT  >  0,  i  = 

v 

Note  that  each  subprogram  0(x  ,B^)  has  a  nonempty  feasible  region.  They 
specialized  Armijo  [1964]  search  rule  under  the  assumption  that  f,  g^, 
i  =  l,...,m,  and  h.,  i  =  1 are  upoer  uniformly  differentiable.  Each 
X.  is  determined  by: 


X 


k 


i 

(i) k 


where  m^  is  the  smallest  nonnegative  integer  such  that 


.  ,  k, /I  \mk  „  .  ,  kw  ,  1  xmk+‘  /  k  Hkx 
4>r( x  +(2 0  d  )  1  <J>rU  i  +  )  v  4>r(x  ,d  ) 


where 


m 


v*d  (xk,dk)  =  Vf(xk)tdk  +  r(  l  y.  +  l 
r  i=l  1  i=l 


-  r 


m 


l  max{0,g,(x  )}  +  [ 

i=l  1  i=l 


(z++zT) ) 
'  i  i 

!  h . ( xk }  | 


The  two  algorithms  can  be  interpreted  as  an  exact  penalty  function  method 
which  attempts  to  solve  a  single  unconstrained  penalty  function  p  (x),  resultina 
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in  a  solution  to  problem  P.  This  exact  penalty  function  approach  was  first 
introduced  by  Fletcher  [1970]  who  transformed  the  oriqinal  problem  into  a 
completely  unconstrained  program.  The  basic  idea  is  that  if  x  is  a  Kuhn- 
Tucker  point  to  problem  P,  there  exists  a  number  r  such  that  x  is  a  local 
optimal  solution  to  the  problem  to  minimize  <{>  (x)  for  all  r  _>  r.  The  lower 
bound  ?  is  estimated  by  the  Laqrangian  multipliers.  For  a  review  of  exact 
penalty  functions,  the  reader  may  refer  to  Pietrzykowski  [1969],  Evans,  Gould 
and  Tolle  [1973],  Howe  [1973],  Conn  [1973],  Conn  and  Pietrzykowski  [1973], 
and  Fletcher  [1975].  For  the  existence  of  a  globally  exact  penalty  function 
in  the  convex  case  and  in  the  nonconvex  case  refer  to  Bertsekas  [1975],  and 
Bazaraa  and  Goode  [1979],  Han  and  Mangasarian  [1979]. 


then  there  exists  a  number  Aq  so  that  x  is  a  local  optimal  solution  to  the 
problem: 

m 

minimize  f(x)  +  A  l  max{0,g-(x)} 

i  =  l  1 

for  all  A  _>  Unfortunately,  hov/ever,  in  the  absence  of  convexity,  the 
above  result  does  not  hold  globally. 
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In  this  paper  we  show,  under  mild  conditions,  that  if  a  compact  constraint 
set  X  is  added  to  the  constraints  g ^ ( x )  £0  for  i  =  l,...,m,  then  the  set  of 
global  optimal  solutions  to  the  original  problem  and  the  set  of  global  optimal 
solutions  to  the  penalty  problem,  for  a  sufficiently  large  penalty  parameter 
X,  are  equivalent.  In  order  to  prove  this  result,  we  use  the  fact  that  a 
family  of  relatively  open  sets  that  cover  X  must  have  a  finite  subcover.  An 
estimate  of  the  size  of  the  penalty  parameter  is  also  given. 

Minimax  and  Quasi -Newton  Algorithms 

An  algorithm  for  solving  a  minimax  problem  over  a  closed  convex  set  is 
deve.oped.  Using  a  newly  developed  continuous  pseudo-directional  derivative, 
a  direction  is  found  by  minimizing  a  positive-semidefir.ite  quadratic  program 
over  the  feasible  region.  A  step  size  is  then  computed  using  an  extension  of 
Armijo's  inexact  line  search. 

The  algorithm  is  specialized  to  both  unconstrained  and  constrained  non¬ 
linear  programs.  For  the  unconstrained  case,  various  steepest  descent  and 
quasi-Newton  methods  are  produced  through  different  choices  of  the  quadratic 
form.  Using  an  exact  penalty  function  to  handle  the  nonlinear  constraints, 
the  direction-finding  problem  reduces  to  a  convex  quadratic  programming  pro¬ 
blem.  Unlike  other  available  direction-finding  routines  that  linearize  the 
nonlinear  constraints,  our  program  is  always  feasible.  A  suitable  step  size 
is  then  found  using  Armijo's  rule.  It  is  shown  that  accumulation  points  of 
the  algorithm  are  indeed  Kuhn-Tucker  points  to  the  original  problem. 

Algorithm  for  Linearly  Constrained  Nonlinear  Programs 

He^e  an  algorithm  for  solving  a  linearly  constrained  nonlinear  program 
is  developed.  Given  a  feasible  solution,  to  avoid  jamming,  binding  and  near 
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binding  constraints  are  identified.  A  direction  is  calculated  by  solving 
a  least  distance  programming  problem  which  is  defined  in  terms  of  the 
gradients  of  these  constraints. 

Once  a  direction  is  found,  an  estimate  of  the  step  size,  using  quadratic 
approximation  of  the  objective  function,  is  first  computed.  This  estimate  is 
then  used  in  conjunction  with  Armijo's  inexact  line  search  to  calculate  a  new 
point.  It  is  shown  that  each  accumulation  point  is  a  Kuhn-Tucker  solution 
to  a  slight  perturbation  of  the  original  problem.  Under  suitable  second  order 
optimality  conditions,  we  show  that  eventually  one  functional  evaluation  is 
needed  to  compute  the  step  size. 
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II.  GENERIC  OPTIMALITY  CONDITIONS  AND  NO.NDIFFERENTIA8LE  OPTIMIZATION 

(J.  Spingarn) 

II. 1  Introduction 


Our  research  during  the  period  covered  by  this  contract  has  centered 
on  two  themes,  both  within  the  compass  of  mathematical  programming:  generic 
optimality  conditions  and  nondifferentiable  optimization. 

II.  1-1  Generic  Optimality  Conditions 

Our  work  on  generic  conditions  continued  the  investigation  that  was 
begun  in  Spingarn  and  Rockafellar  [5].  In  that  paper,  it  had  been  shown 
that  for  almost  all  (v,u)eRn+m,  at  every  local  minimizer  for  the  problem 

Q(v,u)  minimize  f(x)  -  x-v  over  all  xeRn 

satisfying  g^x)  <_  u..  for  all  i  =  m 

the  so  called  "strong  second-order  optimality  conditions"  hold  (assuming  that 
the  functions  f  and  g^  possess  derivatives  of  sufficiently  high  order).  In 
this  sense,  the  strong  second-order  conditions  are  "generical ly"  necessary 
for  (local)  optimality  with  respect  to  the  class  0(v,u). 

When  studying  questions  of  genericity,  the  precise  class  of  problems 
to  which  the  results  apply  is  crucial.  The  family  Q(v,u)  is  only  one  example 
of  a  family  for  which  the  conditions  are  generic.  So  the  question  naturally 
arises:  For  what  other  families  will  the  strong  second-order  conditions, 
or  similar  conditions,  be  generically  necessary  for  optimality?  This  is  the 
question  addressed  by  our  recent  work  on  generic  conditions.  Our  principal 
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accomplishment  in  this  direction  has  been  to  obtain  an  easily  verifiable 
criterion  which  ensures  the  genericity  of  the  conditions. 

In  some  circumstances,  we  found  that  it  is  necessary  to  modify  the 
strong  conditions  themselves.  This  situation  occurs  when  the  family  includes 
both  "fixed"  and  "variable"  constraints.  "Fixed”  constraints  are  those  that 
do  not  vary  with  the  problem  parameters,  while  "variable"  constraints  do. 

The  exact  manner  in  which  the  generic  conditions  depend  on  the  fixed 
constraints  is  also  described  by  our  results. 

II. 1-2  Nondifferentiable  Optimization 

If  f:  Rn-»R  is  a  locally  lipschitz  function,  the  generalized  subdifferential 

of  f  is  the  set-valued  mapping  3f:  Rn->-Rn  defined  by  taking  3f(x)  to  be  the 

convex  hull  of  the  set  of  all  limit  points  of  sequences  of  the  form  (Vf(xn)), 

where  x  -vx  and  f  is  differentiable  at  x  .  (This  definition  is  due  to  F. 
n  n 

Clarke  [9]).  If  f  happens  to  be  convex  then  3f(x)  is  just  the  set  of 
"subgradients"  of  f  at  x,  i.e.,  the  set  {?:  f(z)  -  f(x)  <£,z-x>  VzcRn}. 

When  the  generalized  subdifferential  was  first  studied,  the  motive  was 
to  provide  a  tool  that  would  be  of  use  in  handling  optimization  problems  in 
which  a  function  which  is  neither  convex  nor  differentiable  is  to  be  minimized. 
Most  algorithms  for  solving  constrained  or  unconstrained  minimization  problems 
make  heavy  use  of  derivatives  or,  in  the  nondifferentiable  but  convex  case, 
of  subgradients.  To  generalize  such  algorithms  to  a  broader  class  of  func¬ 
tions,  it  is  necessary  to  have  a  substitute;  hence  the  need  for  the  generalized 
subdifferential . 

Our  work  in  this  area  has  concentrated  on  the  relationship  between  certain 
properties  of  nondifferentiable  functions  and  properties  of  their  generalized 
subdifferentials.  The  basic  goal  has  been  to  identify  subclasses  of  functions 
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which  are  both  likely  to  arise  in  optimization  problems  and  whose  subdiffer¬ 
entials  posess  properties  which  are  likely  to  facilitate  the  development 
of  algorithms. 

Our  principal  achievement  in  this  direction  has  been  to  characterize 

the  class  of  "lower-c''"  functions  in  terms  of  their  subdifferentials. 

Lower-C^  functions  are  a  desirable  class  of  functions  to  study  because  of 

the  natural  way  they  arise  in  optimization  problems.  Anytime  a  function  is 

obtained  by  maximizing  in  one  argument  a  second  function  of  two  arguments 

(e.g.,  f(x)  -  max  g(x,s)  one  obtains  a  lower-C^  function,  provided  the 
s 

second  function  has  a  continuous  derivative  and  the  maximum  is  taken  over 
a  compact  set.  Such  functions  arise  in  decomposition  schemes  for  mini¬ 
mizing  a  function  of  two  arguments. 

The  most  remarkable  feature  of  our  characteri zation  of  lower-C1  func¬ 
tions  is  that  the  corresponding  property  of  the  subdi fferential  mapping  is 
so  closely  related  to  the  "monotone"  property  that  characterizes  the  sub¬ 
differential  of  a  convex  function.  Because  of  this  resemblance,  we  have 
coined  the  word  "submonotone"  for  the  related  property.  The  close  resem¬ 
blance  is  more  than  a  curiosity.  There  is  reason  to  hope  that  the  simi¬ 
larity  will  facilitate  the  transfer  to  nondifferentiable  optimization  of 
algorithms  originally  intended  for  convex  programming. 


38 


II. 2  Research  and  Publications  Summary  -  Generic  Conditions 

The  results  of  our  work  in  this  area  form  the  basis  for  two  articles: 
"On  optimality  conditions  for  structured  families  of  nonlinear  programming 
problems"  (submitted  to  Mathematical  Programming)  and  "Second-order  opti¬ 
mality  conditions  that  are  necessary  with  probability  one"  (to  appear  in 
Proceedings,  Symposium  on  Mathematical  Programming  with  Data  Perturbations, 
George  Washington  University,  May  1979).  The  latter  article  is  a  survey 
without  proofs  of  all  our  research  on  this  subject  to  date,  while  the 
former  contains  the  main  results  and  their  proofs. 

We  investigated  problems  of  the  general  form  indexed  by  a  parameter 

9 

peP,  with  PCP  an  open  set: 

Q(p)  minimize  f(x,p)  over  all  xeCCRn 

satisfying  (x ,p)  £  0  for  all  i =1 , . . . ,m,  and 
h.(x,p)  =  0  for  all  j=l,...,k 

This  class  is  more  general  than  0(v,u)  in  two  important  respects.  First, 
the  manner  in  which  f,  g^  and  hw  depend  on  the  parameter  is  given  more 
freedom.  Rather  than  requiring  perturbation  of  a  special  type  (e.g.,  linear 
perturbations  of  the  objective  function  and  right-hand-side  perturbations  of 
the  constraints),  we  only  required  that  the  family  0(p)  satisfy  a  general 
criterion.  Second,  in  addition  to  the  constraints  £  0  and  h^  =  0,  which 
we  refer  to  as  the  "variable"  constraints,  we  also  investigated  the  effect 
of  the  "structural"  or  "fixed"  constraint  xeC  that  does  not  vary  with  p. 

The  distinction  between  these  two  types  of  constraints  is  important  because 
the  two  types  play  different  roles  in  both  the  analysis  of  the  conditions 
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and  in  the  statement  of  the  conditions  themselves:  the  conditions  that 
turn  out  to  be  generically  necessary  for  optimality  depend  on  the  parti¬ 
cular  class  of  problems  under  consideration. 

Our  principal  accomplishment  here  was  to  give  appropriate  criteria 
for  the  family  Q(p)  which  guarantee  the  genericity  of  the  second-order 
conditions,  and  also  to  describe  the  form  of  the  second-order  conditions 
and  how  they  depend  on  the  fixed  constraint  set  C. 

In  order  to  duscuss  second-order  conditions,  we  found  it  necessary  to 
make  certain  second-order  regularity  assumptions  about  the  set  C.  The 
conditions  that  we  imposed  on  the  set  C  were  incorporated  into  our  defini¬ 
tion  of  "cyrtohedror " .  Cyrtohedra,  which  we  introduced  in  [4],  are  piecewise 
smooth  sets  that  can  be  represented  locally  be  a  finite  number  of  nonlinear 
inequality  and  equality  constraints.  A  cyrtohedron  is  a  union  of  submani¬ 
folds,  called  the  "faces"  of  C,  and  each  xeC  belongs  to  a  unique  such  face. 

In  a  natural  way,  with  each  xeC,  we  can  associate  the  normal  cone  N^(x)  to 
C  at  x,  and  the  tangent  spact  at  x,  L^(x),  to  the  face  containing  x. 

The  second-order  conditions  which  we  showed  to  be  generically  necessary 
for  optimality  are  the  generalized  strong  second-order  conditions  discussed 
previously  in  Spingarn  [4].  P  triple  (x,y,z)cCxR!j,xRk  is  said  to  satisfy 
these  conditions  for  the  problem  Q(p)  if 
(SSOC)  (i)  x  is  feasible  for  Q(p) 

(ii)  -VxL(x,y ,z,p)  relint  N^(x),  where  L  is  the  usual 

Lagrangian,  and  "relint"  denotes  relative  interior 

(iii)  y.  >  0  iff  g^x.p)  =  0,  for  each  i 

(iv)  The  projections  onto  L^x)  of  the  gradients  of  the 
active  constraints  are  linearly  independent 

(v)  If  F  is  the  face  of  C  containing  x  then  V  (LIF)(x,y,z,p) (s,s)  > 
for  all  scRn  satisfying  0  f  scLq(x),  s  perpendicular  to  the 
gradients  of  the  active  constraints. 
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The  family  Q(p)  is  full  provided  the  function  p1 *  ^ ^L(x ,y ,z ,p  )  has 

Jacobian  of  full  rank  at  all  (x ,y ,z ,p)£CxR^xRkxR^  (where  L(x,y,z,p) 
f(x,p)  +  Jy.g.(x,p)  +  J>.h.(x,p)  is  the  usual  Lagrangian).  Our  main  result 

is  the  following: 

Theorem  1 

- - — ^  2 

Let  C  C  Rn  be  a  d-dimensional  cyrtohedron  of  class  C  ,  f  of  class  C  , 

and  g  and  h  of  class  Cs  on  RnxP  with  s  >  max{l,a-m}.  If  Q(p)  is  full,  there 

is  a  subset  PQ  C  P  with  P/PQ  having  measure  zero,  such  that  for  all  BePQ: 

if  5eC  is  a  local  minimizer  for  Q{P)  there  exists  (y,5)eR>k  satisfying  SSOC 
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1 1. 3  Research  and  Publications  Summary  -  Nondifferentiable  Optimization 

We  have  published  our  results  from  this  line  of  work  in  "Submonotone 

subdifferentials  of  Lipschitz  functions”  (to  appear  in  Trans.  Amer.  Math.  Soc.) 

f:  Rn  -*■  R  is  a  lower-C  function  if  every  xcRn  has  a  neighborhood  U  such 

that  for  all  xdJ,  f(x)  =  max  g(x,s),  where  S  is  some  compact  set  and  g  and 

seS 

V  g  are  continuous  jointly  in  x  and  s.  If  f  is  a  locally  Lipschitz  function 

A 

Rn  -*  R,  we  say  that  3f  is  strictly  submonotone  if  for  all  xeRn, 


lim  inf 
x-!  f  x2 
x.  -►  x 

i  *  1,2 


<xrx2,  yry2>  >  q 
! xt -x2 I 


Our  principal  result  is  the  following 

Theorem  2 

f  is  lower-C^  iff  3f  is  strictly  submonotone. 

Notice  the  close  relationship  between  strict  submonotonicity  and  monotonicity. 
The  latter  property  clearly  implies  the  former  since  if  3f  is  monotone,  the 
numerator  in  the  "lim  inf"  above  is  always  nonnegative. 

We  also  investigated  the  property  of  "submonotonicity",  which  is 
stronger  than  strict  submonotonicity,  but  weaker  than  monotonicity.  3f  is 
submonotone  if  for  all  xeRn, 
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Tim  imf 

x^  -*■  x 
1  . 

x  f  x 
yeaf(x) 
yWtx1) 

In  terms  of  the  function  f,  we  showed  that  the  submonotonicity  of  3f 
corresponds  to  a  certain  "regularity"  property  of  the  directional  deriva¬ 
tive  of  f.  We  also  proved  several  results  which  relate  submonotonicity  to 
properties  that  have  been  studied  by  other  authors,  such  as  semi  smoothness 
(Mifflin  [7]),  lower  semi-differentiabili ty  (Rockafellar  [6]),  quasi¬ 
differentiability  (pshenichnyi  [8]),  and  Clarke  regularity  [10].  For 
instance,  we  showed  that  3f  is  submonotone  if  f  is  both  semi  smooth  and 
Clarke  regular. 
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A  GLOBALLY  EXACT  PENALTY  FUNCTION  WITHOUT  CONVEXITY 

i  +-L 

Mokhtar  S.  Bazaraa  and  Jamie  0-  Gooae 


In  this  paper,  we  consider  the  nonlinear  programming  problem  to 
minimize  f (x)  subject  to  g-^(x)  £  0  for  i  =  l,...,m  and  xeX.  If  X  is 
compact,  we  show  under  a  suitable  constraint  qualification  that  a 
globally  exact  penalty  function  exists.  Particularly,  we  show  a 
one-to-one  correspondence  between  global  optimal  solutions  to  the 
original  problem  and  global  optimal  solutions  to  the  penalty  problem 
for  a  sufficiently  large,  but  finite,  penalty  parameter.  A  lower 
bound  on  the  penalty  parameter  is  established  in  terms  of  the  Kuhn- 
Tucker  Lagrangian  multipliers  and  lower  bounds  on  the  functions 
involved . 
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1.  Introduction 


A  great  deal  ot  attention  has  been  givt.u  to  uic  juujtct  cf  cx«'  t  penalty 
functions  where  a  constrained  nonlinear  programming  problem  is  transformed 
into  a  single  unconstrained  problem  or  into  a  finite  sequence  of  uncon¬ 
strained  problems. 

Without  convexity,  the  current  theory  applies  only  locally.  Specifically, 
if  x  is  a  strict  local  minimum  to  problem  P^  to  minimize  f (x)  subject  to 
g^(x)  0  for  i  =  l,...,m,  under  a  suitable  constraint  qualification,  there 

exists  a  number  A^  such  that  x  is  a  local  optimal  to  the  problem  to  minimize 
9(x,A)  for  all  A  >_  A^,  where  0(x,A)  is  an  appropriate  penalty  function. 

For  a  review  of  exact  penalty  functions,  the  reader  may  refer  to  Evans, 

Gould,  and  Tolle  [4],  Fletcher  [5],  Han  and  Mangasarian  [8],  Howe  [9], 

McCormick  [11],  and  Pietrzykowski  [12,13].  For  the  existence  of  a  globally 
exact  penalty  function  in  the  convex  case,  see  Bertsekas  [3]  and  Zangwill 
[15]. 

The  main  result  of  this  paper  is  to  show,  under  mild  assumptions,  the 

existence  of  a  globally  exact  penalty  function  in  the  nonconvex  case. 

Before  proceeding,  it  is  worthwhile  to  briefly  review  the  cases  under  which 

an  exact  penalty  does  not  exist.  In  this  regard,  consider  problem  P^  and 

let  g^Cx),^  =  max  {0,g^(x)}.  Given  the  penalty  parameter  A,  the  penalty 

m 

problem  is  to  minimize  8(x,A)  where  0(x,A)=  f (x)  +  A  £  g.(x)  .  Figure  1  shows 

n  1=1  1  + 

for  m=l ,  the  set  A  =  { (g^ (x)+, f (x) ) :  xeRn).  It  is  clear  that  if  x  solves 
problem  P^,  then  there  exists  a  A^  so  that  x  also  solves  the  penalty  pro¬ 
blem  to  minimize  9(x,A)  for  all  A  >_  A^,  if  and  only  if  there  is  a  nonver- 
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tical  supporting  hyperplane  with  slope  -Aq,  Co  Che  set  A  at  the  point 
(g^ (x)+, f (x) ) .  In  Figure  la,  such  a  supporting  hyperplane  exists,  whereas 
in  Figures  lb  and  lc,  a  globally  exact  penalty  function  does  not  exist.  The 
case  illustrated  in  Figure  lb  can  be  easily  overcome  by  the  stipulation  of  a 
suxtaoie  constraint  quant icatioti  of  the  kind  that  is  needed  to  validate, 
the  Kuhn-Tucker  conditions. 

If  we  modify  problem  so  that  a  compact  set  X  is  included  in  the 
constraints  yielding  the  compact  set  A'  ~  { (g^  ,  f  (x) )  :  xeXl ,  as  shown 

in  Figure  Id,  a  supporting  hyperplane  can  be  found. 

In  this  paper  we  consider  the  following  problem: 

Problem  P:  minimize  f(x) 

subject  to  g^x)  <_  0  for  i  =  l,...,m 

xeX 

We  think  of  the  constraints  defined  by  X  as  easy  constraints  that  must  be 

handled  explicitly  and  of  the  constraints  g^(x)  <_  0  for  i  =  1 . m  as 

those  that  are  treated  by  a  penalty  function.  Typically,  X  contains  lower 
and  upper  bounds  on  the  variables,  and  possibly  linear  constraints.  As 
discussed  above,  we  prove  that  if  X  is  compact  and  under  a  suitable  con¬ 
straint  qualification,  a  globally  exact  penalty  exists.  The  penalty  pro¬ 
blem  under  consideration  is: 

Problem  P(A):  minimize  0(x,A) 
subject  to  xeX 
m 

In  this  study,  we  let  8(x,A)  =  f(x)  +  A  £  g.(x)  .  All  the  qualitative 

i»l  1  m 

results  given  in  this  paper  are  valid  if  the  expression  £  g . (x)  is 
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replaced  by  the  expression  Q(  ||  g(x)+l|  ),  where  Q:  R+  -  R+  satisfies: 

Q(0)  =  0,  Q(6)  >  0  for  5  >  0,  <*>  >  lim  Q(6)/5  >  0 

6-K)+ 

This  assertion  follows  directly  from  a  Theorem  in  [8]. 

Throughout  the  paper,  we  assume  that  f  and  g  are  continuously  differ- 
tiable,  and  that  X  is  closed.  Further,  we  suppose  that  problem  P  is  con¬ 
sistent.  These  assumptions  will  not  be  repeated  in  the  statements  of  the 
theorems  given  in  the  paper.  We  also  note  that  equality  constraints  of 
the  form  h^(x)  =  0  for  i  =  1 can  be  incorporated  without  any  diffi¬ 
culty.  In  order  to  keep  the  notation  and  development  simple,  we  chose 
to  omit  their  inclusion. 

In  Section  2,  we  give  two  different  sufficient  conditions  that  ensure 
the  existence  of  an  exact  penalty  atrict  local  minimum.  Using  compactness 
of  X  and  the  fact  that  a  relatively  open  cover  has  a  finite  subcover,  we 
establish  in  Section  1,  the  existence  of  a  globally  exact  penalty  function. 
Finally,  in  Section  4,  we  provide  some  insight  into  determining  the  size 
of  the  penalty  parameter. 
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2.  Sufficient  Conditions  for  an  Exact  Penalty 
Strict  T.ocal  Minimum 


In  this  section,  we  show  that  an  exact  penalty  strict  local  minimum 
exists  under  two  different  conditions.  These  conditions  generalize 
similar  conditions  which  are  available  in  the  literature  in  that  they 
handle  the  presence  of  the  set  X.  Particularly,  Theorems  2.1  and  2.2 
extend  similar  results  of  Howe  [9]  and  Han  and  Mangasarian  [8],  respectively 
They  assert  that  there  exists  a  positive  number  Aq  such  that  if  x  is  a 
strict  local  minimum  for  'Problem  P,  then  x  is  also  a  strict  local  minimum 
for  Problem  P(A)  for  all  A  >_  Aq.  These  theorems  will  be  used  in  the  next 
section  to  prove  our  main  result  showing  the  existence  of  a  globally  exact 
penalty  function. 

The  following  notation  and  definitions  will  be  used  throughout  the 
manuscript.  Given  xeX,  let 

I+(x)  =  {i:  g±(x)  >  0} 

I  (x)  =  {i:  g^x)  <  0} 

I (x)  =  { i :  gi(x)  =  0} 

x  is  a  strict  local  minimum  for  Problem  P  ■*-*■  there  exists  e  >  0  such  that 
f  (x)  >  f(x)  for  each  x  f  xtX  such  that  llx-xll  <  e  and  g^(x)  i.  0  for 

1  “  1  f  •  •  •  j  m  • 

x  is  a  strict  local  minimum  for  Problem  P(X)  **-■►  there  exists  e  >  0  such  that 
e(x,A)  >  6(x,A)  for  each  x  r  xeX  such  that  llx-xll  <  e. 


I 
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Next,  we  need  Co  provide  suitable  tangential  approximations  to  the 
set  X  at  a  point  xsX.  Following  Rockafellar  [14],  consider  the  contingent 
cone  K(x)  and  the  cone  of  hyper tangents  H(x)  defined  below: 

yeK(x)  ■*-*■  there  exist  a  sequence  {y^l  converging  to  y  and 
a  positive  sequence  (A^;  converging  to  0  such 
that  x  +  A^y^eX  for  each  k. 

yeH(x)  •*-*■  for  each  sequence  { x^}  in  X  converging  to  X.  there 
exists  a  positive  sequence  {A^}  converging  to  0 
such  that  x^  +  AyeX  for  all  Ae(0,A  ) 

Note  that  H(x)  is  a  convex  cone  which  is  not  necessarily  closed  and  that 
K(x)  is  a  closed  cone,  but  not  necessarily  convex.  Further,  H(x)  c:  K(x) . 

Theorem  2.1  below  gives  a  sufficient  condition  for  the  existence  of  an 
exact  penalty  strict  local  minimum,  where  the  closed  convex  cone  C(x)  is 
defined  by: 

yeC  (x)  -<-*■  Vg^(x)ty  <_  0  for  each  iel(x) 

Theorem  2.1 

Let  x  be  feasible  for  Problem  P  and  suppose  that  Vf(x)Cy  >  0  for  each 
0  /  yeC(x)  H  K(x) .  Then: 


1.  x  is  a  strict  local  minimum  for  Problem  P. 

2.  there  is  a  number  A^  >  0  so  that  for  all  A  A^,  x  is 
a  strict  local  minimum  for  Problem  P(A). 


P^oof 


Suppose  by  contradiction  to  part  (1)  that  there  exists  a  sequence  (x. 

K. 

converging  to  x  such  that  x^  r  x,  x^tX,  g^x^)  H  0  for  i  =  l,...,m,  and 
f(x^)  <_  f  (x)  .  Let  y^  =  (x,^-x) /||x^-xll.  Then,  Hy^l  =  1  and  there  exist  a 
subsequence  {y^J^  and  a  vector  y  as  that  fly  II  =  1  and  y^  v  as  k  -*■  ®  in  K . 


52 


Then,  yeK(x).  Since  g^Cx^)  ±  0  =  g  (x)  for  iel(x),  then 

-  t  Mx-vx) 

’8i(x)  yk +  iixZ-sill  - 0 

k 

where  R  (x,h)/llh||  ->  0  as  ||hl|  -»■  0.  By  taking  the  limit  of  (1)  as  k 
it  follows  that  Vg  (x^y  <_  0  for  iel(x).  Therefore,  ycC(x)  fl  K(x) 
||y||  =  1,  then  by  assumption,  Vf(x)Cy  >  0-  But 


in  K , 


Since 


f(x.)  -  f(x)  _ 

'  VC(*>  \  + 


R(x,x^-x) 

llx^-xjl 


where  R(x,h)/||h!l  -*■  0  as  llhll  -*•  0.  Since  f  (x^)  f  (x)  ,  the  left  hand  side  of 
(2)  is  nonpositive  while  the  right  nand  side  converges  to  a  positive  number 
as  k  ->  00  in  K.  This  contradiction  implies  that  x  is  a  strict  local  minimum 
for  Problem  P. 

To  prove  part  (2),  suppose  by  contradiction  that  there  is  a  sequence 
{A^}  such  that  A^  -+  °°  and  x  is  not  a  strict  local  minimum  for  Problem  P(A^), 
Thus,  there  is  a  sequence  {x^}  converging  to  x  so  that  x  4  x^cX  and 


0 (x^ , Afc)  1  0(x,Ak)  =  f(x)  (3) 

Again,  let  =  (x^-x) /llx^-xll.  As  in  the  proof  of  part  (1),  there  is  a 
vector  ysK(x)  with  ||y||  =  1  and  a  set  K  so  that  -*  v  as  k  ■>  «  in  K  •  Now 
suppose  by  contradiction  that  for  some  jel(x),  Vg^(x)Cy  >  0.  Since  g_.  is 
continuously  differentiable,  for  k  in  K  large  enough,  g.(x^)  >  g  (x)  =  0. 
Hence,  by  (3) 


f(xk)  +  XkSj(\)  1  G(xk,Xk)  -  f(x) 


so  that 
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for  large  k  in  K.  As  kzK  goes  to  ®,  the  first  term  in  (4)  converges  to 
Vf(x)Cy,  [g^x^)  -  g  (x)  ]/||xk~xj|  converges  to  Vg  (x)Cy  >  0,  and  A^  •+  ®. 

Since  this  is  impossible,  we  conclude  that  7gi(x)ty  £  0  for  each  iel(x). 

Thus  yeK(x)  ("1  C(x)  and  so  Vf(x)ty  >  0.  Since 

f(x,)  -  f(x)  R(x,x  -x) 

,I!V5|I  ■  7f  1 (*>  +  ■  TS^xT 

and  since  R(x,x^-x)/||x^-xjj  ->  0  and  7f(x)ty,^  ■+  7f  (x)  Cy  >  0,  we  conclude  that 
f  (x^)  -  f  (x)  >  0  for  ke<  large  enough.  But  by  (3),  f  (x^)  <_  QCx^.A^)  £  f  (x)  , 
a  contradiction.  This  completes  the  proof. 

The  assumption  that  7f(x)ty  >  0  for  each  nonzero  vector  yeC(x)  D  K(x) 
guarantees  that  x  is  a  strict  local  minimum  for  Problem  P.  It  also  acts  as 
a  qualification  that  ensures  an  exact  penalty  strict  local  minimum.  Theorem 
2.2  gives  a  similar  result  if  x  is  a  strict  local  minimum  to  Problem  P  and 
satisfies  a  suitable  constraint  qualification  that  does  not  involve  the 
objective  function.  Theorem  2.2  extends  similar  results  of  Pietrzykowski  [12] 
and  Han  and  Mangasarian  [8].  The  following  lemma  is  needed  to  prove  the 
theorem. 


Lemma  2.1 

Let  x  be  feasible  to  Problem  P  and  suppose  that  there  is  a  vector 

ycH(x)  such  that  Vg.(x)ty  <  0  for  each  iel(x).  Let  x.  be  a  local  optimal 

i  a 

solution  to  Problem  P(A).  If  x.  -*■  x  as  A  -*  <*>,  then  x  is  feasible  to 

A  A 


Problem  P  for  A  sufficiently  large. 


Proof 


Suppose  by  contradiction  that  there  exist  a  sequence  {X  ;  and  a  sequence 
{x^}  so  that  Xk  -*■  co  and  ->  x,  where  x^  is  a  local  optimal  solution  to 
Problem  P(X^)  which  is  not  feasible  to  Problem  P.  Since  x^  x  and  g^(x)  0 

for  all  i,  then  1^  (j  c  I(x),  where  l£  and  1^  denote  I+(x^)  and  iCx^), 
respectively.  From  [7],  the  directional  derivatives  of  e(#,X,  )  at  x^  along 
y  is  given  by: 

9<I<k’Xk’y)  *  5f(xk)ty  +  I  +,8i(xk>ty  +  ^  <5) 

iElk  iElk 

Since  is  continuously  differentiable  and  <  0  for  icl(x),  then 

there  is  an  z  >  0  so  that  Vg^Cx^^y  <  -  e  for  iel(x)  and  for  k  sufficiently 
large.  Thus,  (Vg^ (x^) Cy)+  =  0  for  iel^  and  from  (5)  we  get: 

9'<xk,Xk’y>  "  ,£(xk,ty  +  Ak  ^  +7si(xk)ty  <  l7f(xk,ty  “  £  Xkhkl  (6) 

i£Ik 

where  i  I  !  is  the  number  of  elements  in  the  set  it" .  Since  x.  is  not  feasible 
1  k  k  k 

to  Problem  P,  then  jl+J  _>  1 .  Since  X^  ■*  °°  and  Vf(xk)ty  Vf(x)ty,  then  (6) 
implies  that: 

8  '  ^xk’^k’y)  <  0 

for  k  large  enough.  But,  yeH(x)  and  x^  -+•  x  so  that  there  is  a  >  0  so 
that  x^  +  uyeX  for  each  ye(0,yk).  In  view  of  (7),  x^  could  not  have  been 
a  local  minimum  for  Problem  PCX^).  This  completes  the  proof. 

Theorem  2.2 

Let  x  be  a  strict  local  minimum  for  Problem  P  and  suppose  there  is  a 
vector  ycH(x)  such  that  Vg^Cx^y  <  0  for  each  iel(x).  Then,  there  is  a 
Xq  >  0  so  that  x  is  a  strict  local  minimum  for  Problem  I (\)  for  all  X  >_  X^. 
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Proof 

By  Pietrzykowski ' s  theorem  [13],  there  is  a  number  ^  >  0  so  that  for 

A  >  A  there  exist  xA  and  e(A)  such  that: 

X  A 

llx^  -  xil  <  £  (A)  (8) 

lim  £ (A)  =0  (9) 

A  -*■  °° 

8(x^,A)  <_  0(x,A)  for  all  xeX  with  |lx-xll  <  e(A)  (10) 

By  (8)  and  (9),  x^  -*  x  as  A  -*  <*>.  From  (8)  and  (10),  it  follows  that  x^  is 
a  local  minimum  for  Problem  P(A).  In  view  of  this  and  the  assumptions  of 
the  theorem,  it  follows  that  Lemma  2.1  applies,  and  hence  x^  is  feasible 
to  Problem  P  for  A  sufficiently  large.  Thus,  from  (10)  we  get: 


f(xx)  =  6(xx,A)  <  9(x,A)  =  f(x) 

Since  x^  is  feasible  for  P  and  x^  x,  then  f(x^)  =  f(x)  for  A  large  enough. 
But,  since  x  is  a  strict  local  minimum  for  Problem  P,  then  there  is  a 
number  A  >  0  so  that  x,  =  x  for  A  >  A  .  Thus,  for  A  >  A.,  x  is  a  local 
minimum  for  Problem  P(A).  We  wish  to  show  that  it  is  strict.  If  not, 


there  exist  a  sequence  {A,  }  and  a  sequence  (x,  }  so  that  A  -►  °°,  x  /  x,  x. 


where  is  a  local  minimum  for  Problem  P(A^).  By  Lemma  2.1,  for  k 
large  enough,  is  feasible  to  Problem  P.  However,  since  x  is  a  local 
minimum  for  Problem  P(A)  for  A  sufficiently  large,  then  f(x)  =  0(x,A^)  * 
9(x^,A^)  =  f(x^).  We  have  thus  exhibited  a  sequence  {x^}  feasible  to 
Problem  P  so  that  x  ^  x^  -*•  x  and  f  (x^)  =  f(x).  This  contradicts  the  strict 
local  optimality  of  x  for  Problem  P,  and  the  proof  is  now  complete. 
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3.  A  Globally  Exact  Penalty  Function 

In  this  section,  we  present  our  main  theorem  which  asserts  the  exis¬ 
tence  of  a  globally  exact  penalty  function.  This  is  done  by  requiring  the 
set  X  to  be  compact,  in  addition  to  the  existence  of  a  suitable  qualifica¬ 
tion  that  guarantees  a  strict  local  exact  penalty. 


Theorem  3.1 

Consider  Problem  P  and  suppose  that  the  set  X  is  compact.  Denote  the 
set  of  glcv-’l  optimal  sc]”tions  to  Problem  P  by  0.  Suppose 

that  for  each  x^eQ  one  of  the  following  two  conditions  hold: 


a.  Vf(x^)  y  >  0  for  each  0  ?  yeC(x^)  fl  K(x^) 

b.  there  exists  a  vector  yeH(x..)  such  that  )Cy  <  0 

for  all  iel(x.) 

J 


Then  there  exists  a  number  XQ  >  0  such  that  for  ^  is  a  global 

optimal  solution  to  Problem  P(A)  if  and  only  if  x.eQ. 

A 


Proof 

Denote  the  optimal  objective  value  to  Problem  P  by  f  and  consider  the 
family  of  sets  A(«)  and  B(»)  defined  below: 


A(A)  =  {x:  6(x,A)  -  f  >  0} 

(ID 

B (X)  =  A (X)  U  Q 

(12) 

We  first  show  that  B(A)  is  open  in  the  relative  topology  of  X  for  X 
sufficiently  large,  that  is,  given  xeB(X)  there  exists  an  open  neighborhood 
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N^(x)  around  x  so  that  X  D  N-^(x)  c  B(A).  Since  0  is  continuous,  then 
A(A)  is  open  so  that  the  existence  of  the  desired  neighborhood  is  clear 
for  xeA(A) .  Now  suppose  that  x  =  x^eQ.  From  Theorems  2.1  and  2.2,  it 
follows  under  conditions  (a)  or  (b)  that  x^  is  a  strict  local  minimum 
for  Problem  P(A)  for  A  sufficiently  large.  Thus,  there  exists  a  neighbor¬ 
hood  N^(x_.)  so  that  f  =  8{x^.,A)  <  S(y,A)  for  each  x^  /  yeN^(x^)  fl  X,  which 


shows  that  N^(x^)  fl  X  cib (A). 

We  have  thus  proved  that  there  is  a  number  A^  >  0  so  that  the  collec¬ 
tion  ( B C A ) :  A  A^}  is  a  family  of  open  sets  relative  to  X.  Next,  we 
show  that  this  family  covers  X.  Let  xsX  and  consider  the  following  three 
cases : 


Case  1:  f(x)  >  f 

Here,  8(x,A)  >_  f  (x)  >  f  for  all  A  >_  0 
so  that  xeA(A)  C  B ( X )  for  all  A  _>  0. 

Case  2.  f(x)  <  f 

There  must  exist  an  index  i  such  that  g^x)  >  0.  Thus, 
for  A  large  enough,  9(x,A)  >_  f(x)  +  A  g^(x)  >  f  so  that 
xeA(A) c  B(A) . 

Case  3.  f (x)  =  f 

If  g^(x)  >  0  for  some  i,  as  in  Case  2,  xeB(A)  for  A  >  0. 

If  g^(x)  <_  0  for  i  =  l,...,m  so  that  x  is  feasible  to 
Problem  P,  then  xeQ.  Thus,  xeB(A)  for  each  A. 

Since  X  is  compact,  this  relatively  open  cover  has  a  finite  subcover.  Let 

Aq  be  che  largest  A  in  this  subcover.  Noting  that  A ’  >_  A  implies  that 

B (A)  c  B ( A ' )  ,  then 


X  <=  B(A)  =  A(A)  U  Q  for  all  A  >  AQ 

The  above  set  inclusion  can  be  restated  as  follows.  If  A  >_  Aq  and  xeX  then 
either  9(x,A)  >  f  or  else  xeQ  in  which  case  0(x,A)  =  f.  This  is  the  desired 


result  and  the  proof  is  complete. 
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The  following  example  shows  that  in  order  to  validate  the  conclusion 
of  the  above  theorem,  the  qualification  given  by  (a)  or  (b)  in  Theorem  3.1 
must  hold  for  each  global  optimal  solution  to  Problem  P. 


Example  3.1 

Problem  P:  minimize  f (x) 

subject  to  g(x)  £  0 
xeX 


where , 


2  2 

f(xl5x2)  =  -x1  -  x2 

(  x2  -  (x^-1)2 
g(x, ,x„)  =  i  „ 

1  x2  +  (xx-ir 


if  x^  £  1 
if  x^  >_  1 


X  =  {(x^.x-V  x^  +  x2  <_  2,  x^,  x2  0} 


Note  that  the  set  of  global  optimal  solutions  Q  to  Problem  P  is  given  by 
{ (0,1) , (1,0) } .  Thus,  we  have: 

At  x2  =  (0,1) t 

C(xx)  =  ((y1,y2):  2y1  +  y2  <  0} 

K(x2)  =  c£H(x1)  =  {(y1,y2):  y1  0} 


Note  that  0  f  yeCCx^)  D  K(x^)  implies  that  y2  <  0,  so  that  Vf(x^)ty  >  0.  Also,  | 
there  exists  a  vector  yeHCx^)  so  that  VgCx^^y  <  0,  say  y  =  (0,-l)C.  Therefore,! 


both  conditions  (a)  and  (b)  of  Theorem  3.1  hold  at  x^ . 
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VfCx^)  y  >  0  implies  that  <  0,  but  no  restrictions  on  y^  while 
C^)  0  K(x^)  =  ((y^,y2);  y2  =  0}  »  s°  that  condition  (a)  of  Theorem 
3-1  does  not  hold.  Furthermore,  Vg(x2)ty  <  0  implies  that  <  0  so 
that  y^H(x2) •  Thus,  consition  (b)  of  the  theorem  is  not  satisfied. 

In  summary,  the  hypotheses  of  the  theorem  hold  at  x^ ,  but  not  at 
X2*  That  there  exists  no  X  such  that  the  global  optimal  objective 
value  to  Problem  is  equal  to  f  =  -1  is  obvious  by  considering 
x^  =  (~y>  0)eX  which  yields: 


8(xx,X)  <  9(xx,X)  =  f(^)  +  Ag(xx)+  =  <  -1 


Since  compactness  of  X  and  continuity  of  f  imply  that  f  is  bounded 
below  on  X,  it  might  at  first  appear  that  this  boundedness  property 
would  ensure  a  global  exact  penalty  problem  if  there  is  a  local  exact 
problem.  The  following  example  shows  that  this  is  not  the  case. 


Example  3.2 


Problem  P: 


minimize  f(x) 


subject  ro  g(x)  <_  0 
xtX 


f(x)  * 

g(x)  - 

X  +1 

X  =  {x:  x  >  -4} 


Note  that  Problem  P  has  solution  x  =  -1  with  value  f  =  f  (x)  =  .  f  and 

g  are  both  bounded  in  X.  8(x,A)  =  f (x)  +  Ag(x)+  has  a  local  minimum  at 

x  =  -1  for  each  A  >  .  However,  for  each  A  >  0,  9(x,X)  is  arbitrarily 

o 

close  to  0  when  x  is  large.  Thus,  it  is  not  true  that  x  =  -1  is  a 


global  minimum  of  9(x,A)  for  A  large  enough. 
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4.  Estimating  the  Si2e  of  the  Penalty  Parameter 

Theorem  4.1  gives  some  insight  into  determining  a  lower  bound  on  the 
penalty  parameter  in  terms  of  the  Khun-Tucker  multipliers  and  in  terms  of 
suitable  lower  bounds  of  the  functions  f  and  g  .  Conclusion  (1)  asserts 
the  existence  of  a  Kuhn-Tucker  multiplier  vector  at  an  optimal  solution  to 
Problem  P.  This  is  assured  by  assumptions  (a)  and  (b) .  Here,  the  former 
acts  as  a  qualification  and  the  latter  enables  us  to  use  separation  of  dis¬ 
joint  convex  sets.  We  note  that  convexity  of  K(x^)  is  not  very  restrictive, 
and  indeed  holds  if  X  is  convex  or  smooth  at  x^ .  Similar  optimality  condi¬ 
tions  can  be  found  in  Bazaraa  and  Goode  [1],  Guignard  [6],  and  Mangasarian 
[10,  P.  168].  Conclusion  (2)  of  the  Theorem  shows  the  existence  of  a  strict 
exact  local  penalty  if  the  penalty  parameter  exceeds  the  value  of  each  of 
the  Kuhn-Tucker  Lagrangian  multipliers.  Here,  again,  assumption  (a)  is 
used.  This  assumption  can  be  replaced  by  a  suitable  second  order  sufficiency 
condition.  A  similar  result,  in  the  absence  of  the  set  X,  can  be  found  in 
Han  and  Mangasarian  [8].  Conclusions  (3)  and  (4)  yield  the  form  of  the  size 
of  the  penalty  parameter  needed  for  a  global  exact  penalty. 


Theorem  4.1 


Consider  Problem  P  and  suppose  that  the  set  X  is  compact.  Denote  the 
set  of  global  optimal  solutions  {x^,...,x^}  to  Problem  P  by  Q  and  denote 
f(x^)  for  jeQ  by  f.  Suppose  that  for  each  x^eQ  the  following  conditions 


hold: 


l.  Vf(Xj)ty  >  0  for  each  0  ^  y  C(x^  H  K(x^). 
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b.  K(Xj)  is  convex. 


Then: 


4. 


For  each  x.eQ  there  exist  scalars  P.,  >0  for  iel(x.)  such  that: 
j  ij  “  3 


[Vf (x  )  +  l  P  Vg  (x  )]ty  >_0 
3  iel (x^ )  13  1  3 


for  yeK(Xj) 


2.  For  each  x.eQ,  there  exists  a  6.  >  0  such  that  x  ^  x , ,  Hx-x.l|  <  5., 

3  J  _  J  J  3 

and  X  >_  Xj  imply  that  0(x,X)  >  3(x^.,X)  =  f,  where 

X.  >  max  (P*.  ieI(x.)K 
J  i3  3 

m 

3.  There  exists  a  number  s  >  0  so  that  7  g.(x),  >  e  for  each  xeA  D  B, 

i=l  1  +  “ 

where  A  =  {x:  f  (x)  <_  f },  B  =  {xeX:  [|x-x..j|  >_  6q  for  j  *  1 , . .  .  ,h} , 


and  6^  =  min  { 6 ..  :  1  <  j  £  h). 

For  X  >_  Xq,  x^  is  a  global  optimal  solution  to  Problem  P(X)  if  and 
only  if  x^eQ,  where  X^  =  maximum  {X_,...,X^,  and  b  is  such 

that  f(x)  >  -o  for  each  xeX. 


Proof 


Part  (1) 

★  * 

This  part  is  equivalent  to  showing  that  -Vf(x^.)eK  (xj  +  C  (x^),  where 

'ff  ^ 

C  (x . )  =  {  £  a..Vg.(x.):  a, .  >  0  for  iel (x . ) )  and  K  (x . )  is  the  polar 

3  iel(x  )  13  1  *3  11  3  3 

cone  of  K(x.),  that  is,  K  (x.)  =  {y:  y  z  <_  0  for  each  zeK(x.)}.  If  this 
3  3  3 

*  * 

were  not  the  case,  by  convexity  of  K  (x^)  +  C  (x^),  there  exist  a  nonzero 
vector  c  and  a  scalar  a  so  that: 


-c  Vf(Xj)  >_  a 


c  y  <  a 


(13) 


A 

.or  each  yeK  (x^)  +  C  (x^ )  (14) 


Since  OeK  (x.)  +  C  (x.),  then  a  0  from  (14).  Thus,  by  (13),  cCVf(x.)  <  0. 

J  J  J 

Letting  y  =  [  a..  7g.(x.)  in  (14),  where  a . .  0  for  iEl(x.),  it  follows 

icl(xj)  13  1  3  13  3 

that  )  a  ccVg.(x.)  <  a.  Since  this  is  true  for  all  a..  >0,  it 

UK*  '  1J  1  J  ~  10  " 
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follows  that  c  Vg_^(x^.)  <_  0  for  each  iel(x^)  so  that:  ceC(,x^.).  Now,  consider 

*  t 

zeK  (x.).  Then  c  z  £  0  because  otherwise  (14)  would  not  hold  for  y  =  Az 

t  * 

for  sufficiently  large  A  >  0.  Since  c  z  <_  0  for  each  zeK  (x_.),  then, 

**  * 

zeK  (x.),  the  polar  of  K  (x.).  However,  since  K(x..)  is  a  closed  convex 
cone,  then  K(x^)  =  K  (x^.)  [2,  P.  52]. 

To  summarize,  we  exhibited  a  nonzero  vector  ceC(x^.)  D  K(x,.)  with  the 

t  ,  * 

property  c  Vf(x.)  _<  0.  This  violates  assumption  (a)-  Thus  -Vf(x.)eK  (x.) 

3  3  3 

* 

+  C  (x^),  and  part  (1)  follows. 

Part  (2) 

We  first  show  that  x.  is  a  strict  local  minimum  for  Problem  P(X.). 

1  J 

Suppose,  by  contradiction,  that  this  is  not  the  case.  Then,  there  is  a 
sequence  (x^l  in  X  so  that  x^  x^  ,  x^  f  x^. ,  and 

m 

f(xk}  +  Xj  J18i(V+  =  ^k’V  -  6(xj’V  =  1  (15) 

Let  y^  =  (x^-Xj)/  llx^-x^ll  .  Then  there  is  an  index  set  K  of  positive 
integers  such  that  y^_  ->  y  as  keK  approaches  °°.  Note  that  Hyll  =  1  and 
yeK(x^.).  It  can  be  easily  verified  from  (15)  that 

Vf (x.)Cy  +  A .  I  (Vg.  (x.)ty)  <_  0  (16) 

3  Jiel(x  )  1  3  + 

From  Part  (1)  and  (16)  above,  we  get: 

0  L  l  (Vg  (x  )Cy)  -  l  PiiVg.(x  )Cy  1  [  (X f78, (x J *>) 

3iel(x.)  1  3  iel (x . )  13  1  3  iel(x.)  3  13  1  3 


Since  A.  >  P..,  the  above  inequality  implies  that  (Vg.(x.)  y),  =  0,  and  hence 
j  13  ij+ 

Vg  .  (x . )  Cy  _<  0  for  iel(x.).  Therefore,  yeC(x.)  D  K(x.).  By  assumption  (a), 

1  J  J  J  1 

Vf(Xj)Cy  >  0,  which  is  not  possible  from  (16).  Thus  x^  is  a  strict  local 
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minimum  of  Problem  P(A.),  and  there  must  exist  a  number  S.  >  0  so  that  x.  ^  x, 

J  3  J 

l|x-x..ll  <  6^  implies  that  9(x,X^.)  >  0(Xj,X^)  =  f.  Since  0(x,X)  >_  9(x,Xj)  for 
X  >  X  ,  part  (2)  follows. 

Part  (3) 

Consider  the  following  sets: 

B  =  {x:  i |x— x . I !  <  6q  for  some  x..eQ} 

m 

E(v)  =  (x:  £  g.(x)  >  v>,  v  >  0 

i=l  1 

F (v)  =  E(v)  U  B 

Obviously,  B,  E(v),  and  F(v)  are  all  open  for  any  v  >  0.  Furthermore,  the 

m 

open  family  U  F(v)  covers  A  (1  X.  To  show  this,  let  xeA  fl  X.  If  £  g,  (x)  =  ( 

v>0  _  i-1 

then  x  must  belong  to  Q  and  hence  xeBc  F(v)  for  all  v  >  0.  If 
m  m 

1  g . (x}+  >  0»  then  xeF(v)  for  any  v  <  £  g^x)  .  Therefore,  there  exists 

i=l  1  _i=l 

a  finite  subcover,  say  A  fl  X  cr  E(e)  (J  B  for  some  e  >  0.  In  other  words,  if 

xeX  is  such  that  f (x)  f,  then  either  £  g .  (x)+  >  e  or  else  llx-x.H  <  6q 

i=l  J 

for  some  x^eQ.  Thus  part  (3)  follows. 

Part  (4) 

Noting  part  (2),  it  suffices  to  show  that  9(x,X)  >  f  for  xeB  and  X  >_  X^. 

If  f (x)  >  f,  the  result  is  immediate.  Now  suppose  that  f (x)  <_  f  so  that 

m 

xeA  D  B.  By  part  (3),  £  g.  (x)+  _>  z.  Thus: 

i=l 


9(x,X)  =  fCx''  +  X  1  g.(x)+  >  -b  +  Xe  >_  -b  4-  (  ~  )  a  =  f 
i=l 


This  completes  the  proof. 
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An  Extension  of  Armijo's  ?,ule  to  Minimax  and 
Quasi-Newton  Methods  for  Constrained  Optimization 

Mokhtar  S.  Bazaraa  and  Jamie  J.  Goode1 


In  this  study,  we  propose  an  algorithm  for  solving  a  minimax  problem 
over  a  closed  convex  set.  At  each  iteration  a  direction  is  found  by 
solving  a  problem  having  a  quadratic  objective  function  and  then  a  suit¬ 
able  step  size  along  that  direction  is  taken  through  an  extension  of 
Armijo's  approximate  line  search  technique.  We  show  that  each  accumula¬ 
tion  point  is  a  Kuhn-Tucker  solution  and  give  a  condition  that  guarantees 
convergence  of  the  whole  sequence  of  iterates.  The  special  cases  of  uncon¬ 
strained  and  constrained  nonlinear  programming  are  studied.  Through  suit¬ 
able  choices  of  the  quadratic  form,  our  procedure  retrieves  various  steepest 
descent  and  quasi-Newton  algorithms  for  unconstrained  optimization.  For 
the  constrained  case  and  using  an  exact  penalty  function  to  handle  the 
nonlinear  constraints,  our  algorithm  resembles  that  of  Han,  but  differs 
from  it  both  in  the  direction-finding  and  the  step-determination  processes. 


Key  Words:  Minimax  Problems,  Unconstrained  and  Constrained  Nonlinear 

Programming,  Armijo's  Rule,  Global  Convergence,  Quasi-Newton 
Methods,  Steepest  Descent  Methods 
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1.  INTRODUCTION 

In  this  paper  we  consider  the  following  problem: 

P:  minimize  0 (x) 

subject  to  xeX 

Here  X  is  a  closed  convex  set  in  Rn  and  6  is  of  the  form: 

£ 


9  (x)  =  f  (x)  +  l  a  ■  (x) 
j=l  J 

a . (x)  =  max  {  S . . (x) ; 


iel . 
J 


ij 


j  !,...,£ 


We  assume  that  I_.  is  a  finite  set  of  positive  integers  and  that  the 

functions  f  and  8..  arc  continuously  differentiable  on  an  open  set  S  that 
ij 

contains  X. 

Minimax  problems  of  the  above  type  arise  in  various  contexts  and  have 

been  studied  by  many  authors.  For  an  excellent  exposition  of  this  subject, 

both  from  theoretical  and  algorithmic  points  of  view,  the  reader  is  referred 

* 

to  the  works  of  Danskin  [5],  Demyanov  [6],  and  Derayanov  and  Malozemov  [7]. 

The  reader  is  also  referred  to  Chatelon,  Hearn  and  Lowe  [4]  and  Han  [11]  for 
the  special  case  of  unrestricted  minimax  problems  and  to  Madsen  and  Schjaer- 
Jacobson  [15]  for  the  linearly  constrained  minimax  problem. 

In  addition  to  Problem  P  itself,  the  special  case  where  (x)  =  0  for 
each  j  has  been  extensively  studied.  In  [10],  Goldstein  described  a  gradient 
projection  method  for  solving  the  problem  to  minimize  f (x)  subject  to  xeX, 
and  a  similar  procedure  was  proposed  by  Levitin  and  Polyak  [14].  These 

methods  proceed  as  follows.  Gi  n  x,  ,  the  next  point  v  .  is  determined  bv 

k  k+i 
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Vf(x^)  on  X,  where  A^  is  a  suitable  step  size  that  depends 
on  the  Lipshitz  constant  associated  with  Vf.  In  [16],  McCormick  proposed 
an  anti-jamming  procedure  for  solving  the  problem  in  the  special  case  where 
X  consists  of  bounds  on  the  variables,  and  in  a  joint  paper  with  Topia  [17], 
the  procedure  was  extended  to  the  case  of  a  general  closed  convex  set.  In 
[3],  Bertsekas  further  studied  this  class  of  methods  with  emphasis  on  the 
choice  of  the  step  size.  He  also  described  various  ways  of  achieving  super- 
linear  convergence. 

We  also  note  the  class  of  subgradient  optimization  methods  for  solving 
the  problem  to  minimize  f(x)  subject  to  xeX  in  the  case  where  f  is  convex 
but  not  necessarily  differentiable.  Similar  to  the  methods  described  above, 

where  is 

any  subgradient  of  f  at  x^.  For  conditions  on  the  step  size  that  assure 
convergence,  the  reader  is  referred  to  Polyak  [18,19]. 

In  this  paper,  we  propose  an  algorithm  for  solving  Problem  P.  We  con¬ 
cern  ourselves  primarily  with  global  convergence  properties  of  the  algorithm. 
Local  and  superlinear  convergence  through  appropriate  choices  of  the  quadratic 
approximation  are  only  discussed  very  briefly.  At  any  iteration  the  algorithm 
solves  a  subproblem  that  finds  a  search  direction  and  then  takes  a  suitable 
step  along  that  direction.  In  the  case  where  X  is  polyhedral,  the  direction 
finding  problem  reduces  to  a  quadratic  program,  and  in  that  respect,  our 
method  resembles  quasi-Newton  procedures  for  solving  constrained  nonlinear 
programs.  Our  direction-finding  problem  is  also  similar  to  the  one  proposed 
by  Han  [11]  for  solving  minimax  problems  and  primarily  differs  from  it  in 
the  inclusion  of  the  set  X.  The  step  size  along  the  search  direction  is 
obtained  through  an  extension  of  Armijo's  [1]  rule  that  handles  the  nondif- 
f erentiabil1'  ■  y  of  the  objective  function  9. 


given  a  point  x^,  x^+1  is  computed  by  projecting  x^-A^S^  on 


projecting  x^-A^. 
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In  Section  2,  we  introduce  an  approximation  to  the  directional  deri¬ 
vative  that  maintains  continuity.  This  approximation  is  the  key  tool  in 
overcoming  the  difficulties  associated  with  discontinuity  of  the  directional 
derivative  in  determining  a  search  direction.  In  Section  3,  we  present  our 
algorithm  and  in  Section  4,  we  prove  its  convergence  to  a  stationary  point. 
Section  5  is  devoted  to  various  specializations  of  our  method.  Particularly, 
we  discuss  the  cases  of  unconstrained  and  constrained  nonlinear  programming. 
For  unconstrained  problems,  depending  on  the  choice  of  the  direction-finding 
problem,  our  algorithm  gives  rise  to  different,  steepest  descent  and  Newton- 
type  algorithms  coupled  with  the  efficient  Armijo's  step  size  rule.  For 
constrained  programs,  linear  constraints  are  handled  hy  the  set  X  and  non¬ 
linear  constraints  are  treated  by  an  exact  penalty  function.  As  a  byproduct, 
a  slight  modification  to  the  method  of  finding  a  search  direction  for  the 
class  of  quasi-Newton  methods  is  suggested.  This  modification  overcomes 
the  difficulty  of  premature  termination  in  case  the  linearization  of  the 
feasible  region  at  the  current  point  is  empty. 
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2. _ APPROXIMATING  THE  DIRECTIONAL  DERIVATIVE 


Note  that  the  objective  function  9  is  not  differentiable  but  has  a 
derivative  along  any  direction  d.  Particularly,  the  directional  deriva¬ 
tive  0f(x,d)  is  given  by: 


8'(x,d)  =  Vf(x)td  +  J  max  {Vg..(x)td} 

j=l  ielj  (x)  13 


(2.1) 


where 


I^(x)  =  {  i  :  g^Cx)  =  (x)  } 


(2.2) 


Since  S'  is  not  continuous  in  x,  a  difficulty  which  could  ultimately  lead  to 

* 

jamming,  we  introduce  the  following  approximate  directional  derivative  9  (x,d) 
which  is  continuous  in  both  x  and  d: 


0*(x,d)  -  f  (x)  +  Vf  (x)  td  +  l  max  {  3  (x)  +  V8  .  .  (x)  td  }-0  (x)  (2.3) 

j=l  id.  3  13 


If  the  functions  f  and  8^.  satisfy  a  strong  version  of  differentiability, 

which  we  refer  to  as  upper  uniform  differentiability,  then  a  one-sided  second 

order  approximation  of  9(x+Xd)  using  the  pseudo  directional  derivative 
* 

6  (x,d)  can  be  devised.  As  will  be  seen  in  the  remainder  of  the  paper,  this 
approximation  is  instrumental  in  proving  convergence  of  the  proposed  algo- 


ri thra. 
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Definition  2.1 

Let  S  be  an  open  convex  set  in  Rn  and  let  f:  Rn->-R.  f  is  said  to  be 
upper  uniformly  differentiable  in  S  if  f  is  continuously  differentiable  in 
S  and  if  there  is  a  number  >  0  so  that 

f(x+d)  <_  f(x)  +  7f(x)Cd  +  1/2  K^Jdjl2  (2.4) 

whenever  x,  Jrt-deS. 

Note  that  if  f  has  a  Lipschitz  continuous  derivative  in  S  then  it  is 
upper  uniformly  differentiable.  That  is,  if  there  is  a  number  1/2  so 
that 

||vf(y)  -  V f (x)[|  £  1/2  Kf  ||x-y{[  for  x,yeS 

then  for  x  and  d  such  that  x,  x+deS,  by  the  mean  value  theorem,  we  can 
write 

f(x+d)  -  f(x)  =  Vf(y)td 

for  some  y  between  x  and  x+d.  But  then 

f(x+d)  -  f (x)  -  Vf(x)Cd  =  [ Vf (y)  -  Vf(x)]td 

1  1/2  Kf  f|y— x  J|  Id  II 
<_  1/2  Kf  I!d||2 

anu  hence  f  is  upper  uniformly  differentiable  in  S. 
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Lemma  2.1 

Let  S  be  an  open  convex  set  in  Rn  and  suppose  that  f  and  8^ 
ielj  and  j=l,...,£  are  upper  uniformly  differentiable  in  S.  Then, 


is  a  number  K  >  0  so  that  the  following  hold  for  all  x,x+deS: 


for 

there 


1.  0  (x+d)  1  0  (x)  +  0*  (x,d)  +  1/2  K  !|c^|2 

2.  0*(x,Ad)  _<  A0  (x,d)  for  all  Ae[0,l] 

3.  0 (x, Ad)  <  9 (x)  +  A9*(x,d)  +  1/2  A2K  |  |dj [2  for  all  Ae[0,l] 

Proof 

Since  f  and  3^  are  upper  uniformly  differentiable,  then  there  exist 

scalars  K,  and  K. .  >  0  so  that: 
f  i  J 


f(x+d)  <  f(x)  +  VfU^d  +  1/2  Kf  | [d  |f  (2.5) 

Bij  (x+d)  <  6i:}(x)  +  VBij(x)td  +  1/2  K..  |[d||2  (2.6) 


for  all  x*  x+deS.  Let 
(2.6)  we  get: 


max  K 
iel- 


ij 


and  suppose  that  x,  x+deS. 


Then  from 


8±j  (x+d)  <  B.^x)  +  VBij(x)td  +  1/2  K±.  J|d|J2 

<_  max  (0  .  (x)  +  VB  .  (x)fcd)  +  1/2  K.  jldll2 
rel^  3  2 

*  ^(x)  +  a* (x,d)  +  1/2  K.  \\d\\2  (2.7) 


where 


* 

Otj  (x,d) 


max  (B  • (x)  +  VB  . (x)Cd)  -  a .  (x) 

c-  T 


(2.8) 
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Since  (2.7)  holds  for  each  iel.,  then 

J 


a_.  (x+d)  £  au  (x)  +  a_.(x,d)  +  1/2  K.. 


(2.9) 


Summing  (2.5)  and  (2.9)  for  j=l,...,£  and  noting  (2.3)  and  (2.8)  we  get: 


8  (x+d)  <  0(x)  +  8  (x,d)  +  1/2  K  |  jd|  f 


where 


£ 

K  =  K  +  y  K.  (2.10) 

f  i-i J 


which  proves  part  (1). 


Now  let  Ae[0,l] 


and  consider  ct^.(x,Ad) 


below: 


a.(x,Ad)  =  max  (6..  +  AVS..(x)td}  -  a.(x) 

J  i£lj  J 

=  max  ( A [6 .  .  +  VS . . (x) Cd]  +  (1-A) 8  .  . }  -  a .  (x) 

iel  1J  1J  ij  J 

j 

_<  A[ou(x)  +  a_^  (x)  ]  +  (l-A)cu  (x)  -  cu  (x) 

=  Aa*(x)  (2.11) 


Thus,  part  (2)  follows  immediately  from  (2.11).  Now  part  (3)  is  obvious 
from  parts  (1)  and  (2)  and  the  proof  is  complete. 
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It  is  well  known  that 


0(x+Ad)  =  0(x)  +  A  0'(x,d)  +  0(d,A) 


where 


OjdAl 

A 


0  as  A  -*■  0"*" 


uniformly  in  d  with  | [ d 1 1  =  1  (see  for  example  Demyanov  and  Molozemov  [7, 

* 

p.53]).  However,  conclusion  (3)  of  the  lemma  would  be  false  if  0  (x,d) 
is  replaced  by  @'(x,d).  This  is  evident  by  considering  0 (x)  =  |x|  which 
corresponds  to  f  (x)  =  0,  Z  =  1,  8,^  (x)  =  x  and  B2^(x)  =  _x' 
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3.  DESCRIPTION  OF  THE  ALGORITHM 


We  present  below  a  procedure  for  solving  Problem  P. 

Initialization  Step 

Choose  x^eX  and  choose  6^,62  wdt-^  9  <  2<^]_  <  <S2.  ^et  anc^  S°  to  9teP  ^ ' 

Step  1  (Find  a  direction) 

Given  x,  £X,  let  B  be  a  positive  semidefinite  matrix  satisfying 
K.  K 


dtBkd  <  62  j  |dj  |2  for  all  deRr 


Consider  Problem  D(xk)  below: 


D(xk)  :  minimize  9  (xk>d)  +  1/2  dCBkd 


subject  to  x^  +  deX 

If  Problem  D(xk)  has  an  unbounded  optimal  solution  go  to  Step  2.  Otherwise, 

* 

let  d^  be  an  optimal  solution  to  Problem  D(xk).  If  9  (x^d^)  —  0;  stop.  If 
9*(xk,dk)  _<  -  <5X  lldji2,  8°  to  SteP  3-  If  9  ^Vdk'  >  "  >  8°  to  SteP  2- 


Step  2  (Modify  the  search  direction) 

Replace  3k  by  [1  -  (25 ^/6 +  26^1.  Let  dk  be  an  optimal  solution  to 
Problem  D(xk)  and  go  to  Step  3. 
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Step  3  (Find  Armijo  step  size) 

Given  x^  and  d^,  let  be  the  smallest  nonnegative  integer  v  such  that: 


e(xk +  <i>\>  -  9<V  5  <|)wl  9*<vdk> 


1  \ 

Let  x^+^  =  x^  +  (-j)  d^-  Replace  k  by  k+1  and  go  to  Step  1. 

By  convexity  of  X  it  is  clear  that  the  algorithm  always  generates 
feasible  points  to  Problem  P  so  that  x^£X  for  each  k.  The  direction¬ 
finding  problem  is  equivalent  to: 


D'(x^)  :  minimize  f  (x^)  +  Vf(x^)Cd  +  \  y^  -  0  (x^)  +  y  d^^d 

iect  to  v  .  >  8  .  .  (x,  )  +  VB  .  .  (x.  )fcd  iel . ,  i=l , . . .  ,1 

■J  -  X]  K  1J  K  J 


sub 


x,  +  d£X 
k 


In  the  next  section,  we  show  that  0  (x^id^)  =  0  if  and  only  if  x^  is 
Kuhn-Tucker  point  to  Problem  P1  defined  below: 


P':  minimize  f (x)  +  £  y. 

J“1 


subject  to  y^  >  8^  (x) 


xcX 


i£lj  j  J^l,  •  •  • 


Since  this  latter  problem  is  equivalent  to  Problem  P,  then  the  algorithm 
stops  only  when  a  Kuhn-Tucker  solution  is  at  hand. 

If  X  is  polyhedral,  then  Problem  D(x^)  is  a  convex  quadratic  program. 


Note  that  in  Step  1,  we  do  not  require  B  to  be  positive  definite.  In  fact, 
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the  case  where  B,  =  0  is  of  special  interest  since  it  leads  to  a  linear 
K. 

program.  If  the  optimal  solution  is  unbounded,  however,  is  modified 
slightly  in  Step  2  in  order  to  guarantee  a  bounded  optimal  solution  d  . 

K. 

Note  that  the  identity  in  Step  2  can  be  replaced  by  another  sufficiently 
positive  definite  matrix  if  that  is  deemed  more  desirable. 

Step  2  is  also  needed  for  cases  where  the  pseudo  directional  deriva- 
tive  0  goes  to  zero  too  fast  compared  to  j|d^[  •  This  would  cause  the  Armijo 
integers  m^'s  to  become  large.  Step  2  recomputes  with  a  positive  defi¬ 
nite  quadratic  form  to  prevent  this  and  to  assure  the  uniform  upper  bound 
on  given  by  Lemma  3.1.  Note  also  that  if  Step  2  is  used  then  the  new 
vector  d  automatically  satisfies  0  (x  ,d  )  <  -  6  Ijd  !|  .  It  is  also  in- 

K.  K.  K.  X  K. 

teresting  to  note  that  if  jj  26^||djj^  at  Step  1  then  Step  2  is  not 

needed.  This  follows  directly  from  the  fact  that  0  _>  9  (x,.  ,d^_)  + -j  d^B^d^. 

Therefore,  if  is  chosen  to  be  sufficiently  positive  definite  so  that 

d  B^d  •>  25-J  [d[ I’  for  all  deR  ,  then  Step  2  is  never  used.  As  will  be 

demonstrated  in  Section  5,  in  some  special  cases,  we  can  devise  schemes 

for  generating  a  nonpositive  definite  matrix  B^  in  such  a  way  that  it  is 

t  2 

a  priori  guaranteed  that  d^B^d^  _>  26^]  |dj  I  which  eliminates  the  need  for 
Step  2. 


Lemma  3 . 1 

The  integers  m^'s  defined  in  Step  3  of  the  algorithm  exist  and  m,^  £  [y]+  1, 

K 

where  [y]  is  the  greatest  integer  in  y,  and  y  =  Zn  (-g  )/Zn  2,  where  K  is 
given  by  (2 . 10) . 


Proof 


By  part  (3)  of  Lemma  2.1  we  have: 


6<*k +  ■  ,<xk>  -  (i,v  0*(vdk> +  (f)Zv+1  kIM2 
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4.  GLOBAL  CONVERGENCE 


In  this  section,  we  prove  global  convergence  of  the  scheme  described 
in  Section  3.  The  following  two  lemmas  are  needed.  Lemma  4.1  asserts  that 
the  algorithm  stops  only  if  the  point  at  hand  is  a  Kuhn-Tucker  solution  to 
Problem  P' ,  which  is  equivalent  to  Problem  P.  The  second  lemma  shows  that 
if  x,  x  +  deX  and  if  {x^}  in  X  converges  to  x,  then  there  is  a  direction  d 
sufficiently  close  to  d  such  that  x^  +  deX  for  large  k. 

Lemma  4 . 1 

Let  xeX.  Then  (x,  a(x))  is  a  Kuhn-Tucker  solution  to  Problem  P'  if  and  only 

*  _  _  _  _ 
if  0  (x,d)  =>  f),  where  d  is  any  optimal  solution  to  Problem  D(x)  to  minimize 

*  -  1  t 

0  (x,d)  +  ^  d  Bd  subject  to  x+deX  and  where  B  is  positive  semidef inite. 

Proof 

—  —  *  _  _ 

Let  d  be  an  optimal  solution  to  problem  D(x).  Further  suppose  that  0  (x,d)  =  0. 

Since  d  =  0  is  feasible  to  Problem  D(x)  and  has  an  objective  value  equal  to  0, 

and  since  B  is  positive  semidef inite,  then  dtBd  =  0.  Thus,  the  optimal  objec- 

/*\  .  — 

tive  value  is  equal  to  0  so  that  d  =  0  is  an  optimal  solution  to  Problem  D(x). 

A  A  -«  « 

Therefore,  (d  =  0,  y  =  a(x))  is  an  optimal  solution  to  Problem  D  (x)  .  This 

A  A 

further  implies  that  the  Fritz  John  conditions  stated  in  [2]  hold  at  (d,y). 

That  is,  there  exist  nonnegative  scalars  Uq  and  v ,  not  all  equal  to  0,  such 
that : 
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Vjjlyj  -  s±j(x)  -  VB^OO*  d]  =  0  iel,  j  =  !,...,£  (4.3) 


Note  that  u^  >  0  because  if  u^  =  0  then  by  (4.2),  *  0  for  all  i,j, 

which  is  impossible.  Noting  that  d  =  0  and  that  >  0,  (4.1),  (4.2),  and 
(4.3)  show  that  (x,a(x))  satisfy  the  Kuhn-Tucker  conditions  for  Problem  P' 

_  A  __ 

Conversely,  suppose  that  (x,y  =  a(x))  is  a  Kuhn-Tucker  solution  to 


Problem  P' . 
that: 


Then  there  exist  scalars  u . .  >  0  for  iel.  and  j 

ij  -  J 


!,...,£  such 


[Vf  (x)  +  J  £  u.  .70  .  (x)  J11  d  _>  0  if  x  +  deX 

1=1  iel.  XJ 

3 


1  u. .  1  j  1 , . . . 

iel  J 
j 


-  8ij  (x)  J  =  0  iely  j  = 


These  conditions  are  precisely  (4.1),  (4.2),  and  (4.3)  with  d  =  0,  u^  =  1, 

A  A  _ 

v..  =  u...  Therefore,  (d  =  0,  y  =  a(x))  is  a  Kuhn-Tucker  solution  to  Problem 
rj  r  j 

D'(x).  Since  this  problem  is  convex,  then  this  solution  is  optimal.  Clearly, 

_ .  _  A 

Problems  D(x)  and  D'(x)  are  equivalent  and  hence  d  =  0  is  an  optimal  solution 

to  Problem  D(x).  Thus  the  optimal  objective  value  is  equal  to  0,  and  hence 

_  _  ★  —  — 

any  optimal  solution  d  to  Problem  D(x)  must  satisfy  0  (x,d)  =  0.  This  follows 

*  —  — 

because  if  6  (x,d)  <  -z  for  some  z  >  0,  then 


6* (x, Ad)  +  y  A2dCBd  £  -Az  +  y  A2dCBd  <  0 


for  A  >  0  and  sufficiently  small,  violating  the  fact  that  the  optimal  objec¬ 
tive  value  for  Problem  D(x)  is  equal  to  0.  This  completes  the  proof. 
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Lemma  4.2 

Let  X  be  a  convex  set  in  Rn  and  let  x£X.  Let  d  4  0  be  such  that  x  +  deX 
and  let  {x^}  be  a  sequence  in  X  converging  to  x,  Then  given  an  e  >  0,  there 
exists  a  vector  d  such  that  ||d-d|j  <  e  and  x^  +  d£X  for  k  sufficiently  large. 

Proof 

Let  ri(X)  denote  the  relative  interior  of  X.  Then  there  exists  a  point 
y  4-  x  +  d  such  that  yeri(X).  Now  consider  d  given  by 


Then 


x  +  d=  (x  +  d)  +  6 


(y-x-d) 

|y-x-d| 


PsTaiT^  +  (1  -  ||y-5-d||)  <*  +  d) 


Thus,  x  +  d  is  a  convex  combination  of  y  and  x  +  d  so  that  x  +  deri(X).  There¬ 
fore,  there  exists  a  z  >  0  so  that  if  Jjx  +  d  -  hjj  <  z  and  if  h  lies  in  the 

affine  manifold  generated  by  X  then  hcX.  Since  x^>  x,  x  +  deX,  it  is  clear 
that  x^  +  d  is  in  the  affine  manifold  generated  by  X.  Now  let  h  =  x^  f  d. 

Then  j|x  +  d  -  h|j  =  [|x  -  J [,  and  since  -+•  x,  it  follows  that  ||(x  +  d)  - 

(x^  +  d)[|  <  z  for  k  sufficiently  large  so  that  x^  +  deX.  This  completes  the 

proof . 

Now  we  are  ready  to  state  our  main  convergence  theorem.  The  theoLem 
shows  that  each  accumulation  point  x  corresponds  to  a  Kuhn-Tucker  solution 
(x,a(x))  to  Problem  P’ .  As  a  corollary,  we  demonstrate  that  if  x  is  a  strong 
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local  minimum  then  indeed  the  whole  sequence  {x^l  converges  to  x.  Here,  x 
is  a  strong  local  minimum  to  Problem  P  if  there  exists  a  number  y  >  0  so 
that  for  each  6  >  0  there  is  a  number  z(£)  >  0  sc  that 


xeX,jjx-xj|<  y,  and  0  ,'x)  -  9(x)  <  z(6)  =>  JJx— x  J  ]  <  6 


(4.4) 


Theorem  4 . 1 

Consider  the  algorithm  described  in  Section  3  for  solving  Problem  ?.  If  the 
algorithm  stops  at  iteration  k  then  ^x^,a(x^))  is  a  Kuhn-Tucker  point  for 
Problem  Pf.  Otherwise  the  algorithm  generates  an  infinite  sequence  { (x.  ,d^)}. 
In  this  case,  if  (x,d)  is  an  accumulation  point,  then: 

1.  lin  d  =  0  and  in  particular  d  =  0. 

k 

2.  (x,a(x))  is  a  Kuhn-Tucker  point  for  Problem  P'. 


Proof 

If  the  algorithm  stops  at  iteration  k  then  c  (x^,d^)  =  0  and  by  Lemma  4.1 
it  follows  that  (x^,a(x^))  is  a  Kuhn-Tucker  point  for  Problem  PT.  Now  sup¬ 
pose  that  the  algorithm  generates  the  infinite  sequence  { (x,  ,d  )}  and  suppose 

iC  K 

K  _  _ 

that  there  is  an  infinite  set  K  of  positive  integers  such  that  (x.  ,d.  )-*-(x,d)  . 

K  - 

First,  note  that  9(x^)  is  decreasing  and  that  0  (x^)-*-0  (x) ,  and  hence 

lim  9(x^)  =  9(x).  Also  we  have 
k-*°° 


,l,“k+l 


1,%+! 


6(x  )  ~  0(x  )  <  (j)  K  0  (x  ,d,  )  1  ~  (p  6X  ! |d.  1 1  for  all  i 


and  hence  the  right  hand  side  nu^t  converge  to  0.  But  by  Lemma  3.1,  n.  is 
buundcd  above  so  that  d  ■*  'J ,  and  particularly  d  =  0.  This  proves  part  (P. 
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Since  dtB  d  <  S  ||d||^  for  all  d£Rn  and  all  k,  then  there  exists  an 

£ « 

infinite  set  of  positive  integers  K'  £=  K  such  that  B  -*■  B,  and  furthermore 
B  is  positive  semidef inice .  Now,  suppose  by  contradiction  to  Part  (2) 
that  (x,a(x))  is  not  a  Ku’nn-Tucker  point  for  Problem  P’.  Then  by  Lemma 

V  —  L. 

4.1  an  optimal  solution  d'  to  the  problem  to  minimize  6  (x,d)  +  —  d  Bd 

—  *  —  * 

subject  to  x  +  deX  must  satisfy  6  (x,d  )  <  -  z  for  some  z  >  0.  By  con¬ 
tinuity  of  0  and  by  Lemma  4.2,  there  exists  a  vector  d  such  that 
9  (x^d)  <  -  z  and  x^  +  deX  for  keK'  sufficiently  large.  By  Lemma  2.1, 
for  Ae(0,l)  we  have: 


9  (x^,Ad)  +  -j  A~d  B^d  _<  A0  (x^,d)  +  —  A  d  B^d 


<  -Xz  +{  X262  Ildll2 


Let  X  =  rain  {1,  — - — ~  }.  Then  ©  (x,  ,Xd)  +  i  X^d  B  d  _<  -h,  whert 

4,IWi  ^  R 


z  -  \  ^2  iwi2  if  c1,  n«ir 


1^ _ z 

2  r 


<s2!idir 


if  z  <  |!d]i' 


We  have  thus  constructed  a  vector  d  =  Ad  so  that  x^  +  d£X  for  large  keK  and 
furthermore  0  (x^,d)  +  jp  d  B^d  — h  <  0.  But  since  d^  solves  Problem  D(x^)  , 
then  0*(xk,dk)  +  d^B^  £  -h.  Letting  k  in  K'  approach  »  and  noting  that 
d  =  0,  ic  follows  that  0  <_  -h.  This  contradiction  proves  part  (2). 


Corollarv 


If  the  accumulation  point  x  is  a  strong  local  minimum  for  Problem  P,  then 


lim  x^  =  x. 
k-*oo 
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Proof 

Lee  y  >  0  be  the  number  given  in  the  definition  of  a  strong  local  minimum. 

Fix  0  <  6  <  ^  .  We  will  show  that  there  exists  an  Z  such  that  ||x^-x||  <  <5 

K  - 

for  all  k  _>  £,  which  proves  the  result.  Since  d,  -*•  0  and  x^  -*■  x,  then  there 
is  an  ZsK  such  that 


||x£-x||<  6,  9(x£)  -  8(x)  <  z(6),  l|dk|(<  J  for  all  k  >_  £  (4.5) 


We  show  the  desired  result  by  induction.  For  k  -  Z,  the  result  immediately 
follows  from  (4.5).  Now  let  k  >_  £  and  suppose  that  j|x,-x||  <  6  and  note  that: 

llx^-j-xll  <_  llx^-xjl  +  \\\~XW  1  Hdkll  +  6  <  ^  +  ^  =  Y  (^-6) 


Further,  since  0  (x^_,  <  0(x^),  from  (4.5)  if  follows  that  9(xk+1)  -  9  (x)  <  z(6). 

In  view  of  (4.6)  and  (4.4)  it  is  then  clear  that  ||xk+1~xj|  <  6.  This  completes 
the  induction  argument. 

It  may  be  noted  that  if  the  directions  generated  by  the  algorithm  do  not 
converge  to  zero,  then  9(x^)  -0°  so  that  the  problem  has  an  unbounded  solution. 

This  follows  by  noting  that  9  is  decreasing  and  that  if  there  exists  set  of 
positive  integers  K  so  that  jidjJ!  £  >  0  for  keK,  then 

-  e<V  i  -  i  aiN|2 


< 


for  each  kcK 


If  {x:  9(x)  <  0(x^),  xeX}  is  compact  then  {x^l  has  an  accumulation  point.  If 
the  functions  f  and  for  all  i,j  are  convex,  then  every  accumulation  point 
is  an  optimal  solution  to  Problem  P. 
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5.  SPECIAL  CASES 


In  this  section,  we  discuss  various  specializations  of  the  algorithm 
to  unconstrained  and  constrained  nonlinear  programming  problems. 


Unconstrained  Nonlinear  Programming 

Here  we  let  X  =  Rn  and  a_.  (x)  =  0  for  j  =  1,...,£.  Under  different 
choices  of  our  algorithm  produces  various  methods  for  solving  the  pro¬ 
blem  to  minimize  f(x)  subject  to  xeRn. 


Steepest  Descent  Methods 

At  any  iteration  k  the  direction-finding  Problem  D(x^)  is  t0  minimize 
Vf(x^)Cd  +  d^'B^d .  The  following  choices  of  are  examined.  For  each  of 
these  choices  all  entries  of  are  uniformly  bounded  so  that  any  sequence 
{B^}  has  a  convergent  subsequence  as  needed  in  Theorem  4.1. 

Steepest  Descent  Under  the  Euclidean-Norm 

Let  =  I.  Here  d^  =  -Vf(x^)  and  d^B^d^  =  -||Vf(x^)||  ,  where  j|'|[  denotes 

t  2 

the  Euclidean  norm.  Note  that  8  (x^,d^)  =  Vf(x^)  d^  =  -|  jd^l  J  so  that  Step  2 
o'  the  algorithm  is  never  used  by  letting  6^  =  1.  In  this  case,  our  algo¬ 
rithm  reduces  to  that  of  Armijo  [1] 

Steepest  Descent  Under  the  Sup-Ncrm 

Let  Bj_  be  a  diagonal  matrix  whose  ith  diagonal  entry  b^  is  given  by 


3  f  (x.  ) 

bi  ’  I -55^-1 


L  ,  •  •  .  jfl 


where  !l*|!_  denotes  the  £  -norm.  Note  that  B,  is  positive  semidef inite .  An 
1  1  k 

optimal  solution  d  to  Problem  D(x  )  is  given  by 

rv  1C 
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I-  HvfCx^j^  if  afc^/sx.  > 

ll^f(*k)lii  if  8f(xk)/9xi  " 

0  if  3f(x,  )/3x.  = 

k  X 

Note  that  6*(xk,dk)  -  |£  =  -fldjg 

sup-norm.  If  we  let  6^  =  1,  it  is  clear  that  Step  2 
used. 


Steepest  Descent  Under  the  £^-Norm 

Let  ||*  jjg  denote  the  sup-norm  and  let 


c . 

l 


3f  (xi-)  /3x-j 


x  L )  •  •  •  )  n 


Let  I  =  {i:  jc^)  =  1} ,  and  without  loss  of  generality 
Let 


d  =  (cr...,cv) 
et  =  (cxH-l,“',Cn) 


Now  consider  the  matrix  given  below: 


0 

0 

0 

,  where  ||*  ||  denotes  the 
of  the  algorithm  is  never 


suppose  that  I  =  {l,...,v} 


V  rows 

n-v  rows 
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We  will  demonstrate  that  B,  is  positive  semiaef inite ,  give  the  form  of  an 
optimal  solution  which  turns  out  to  be  a  steepest  descent  direction  under 

the  -£^-norm,  and  then  show  that  Step  2  of  the  algorithm  is  not  needed.  Let 

,  .  _v  .  „n-v 

y  and  z  be  arbitrary  vectors  m  R  and  k.  .  then: 


,  t  t.  _  ,y.  t  ,  ,t  ,  t  ,t  n-v  t 

(y  ,z  )  (^)  =yddy+zedy+  z  z 


Denote  yCd  by  a  and  zte  by  g.  Then,  the  above  equation  yields: 


.  t  t.  ,y.  ,1.2  12  n-v  t 

(y  ,Z  )  Bk  V  =  (a  +  2  "  "4  8  +  IT  Z  Z 


(5.1) 


By  the  Schwartz  inequality  and  noting  that  the  absolute  value  of  each  com¬ 
ponent  of  e  is  less  than  1,  we  have: 


g2  £  INI2  IN!2  <  (n~v)  INI2 


From  (5.1)  it  is  then  clear  that  is  positive  semidef inite .  Next  note  that 
d^  given  below  is  a  solution  to  the  system  Vf(x^)  +  B^d  =  0,  which  shows  that 
under  '■his  particular  choice  of  S  ,  our  quadratic  program  yields  a  steepest 
descent  direction  under  the  £^-norm. 


V  3x . 

l 


dik  “ 


1 , .  . .  ,v 


=  v+1 , . . . , n 


Finally,  note  that 


9*<V“k>  ’  7£l*k)tdk 


,  V  df(x,)|2  ~  2 

-t  I-ir-l  -  -  -f|dk  ||J 


o  / 


where  jj*jj^  denotes  the  £^-norm.  Therefore,  Step  2  of  the  algorithm  is  not 
needed  by  letting  6^  =  1. 

A  Newton-Ty^e  Method  for  Unconstrained  Optimization 

In  [9],  Gill  and  Murray  proposed  a  Newton-type  procedure  that  produces 
a  positive  definite  matrix  through  a  modified  version  of  Cholesky's 
factorization  of  the  Hessian  H^.  If  H^  is  sufficiently  positive  definite 
then  =  H^.  Otherwise  is  of  the  form  +  E^,  where  is  a  diagonal 
matrix  with  nonnegative  elements. 

If  during  the  factorization  process  of  into  the  form  LDL*",  a  dia¬ 
gonal  element  of  D  is  not  sufficiently  positive,  then  it  is  replaced  by  a 

suitable  positive  scalar  q.  The  factorization  is  stable  and  can  be  performed 

n3  t 

within  multiplications.  At  the  end,  B^  =  L^D^_L^  is  at  hand  and  the 

search  direction  d^  is  obtained  by  solving  the  system  Vf  (x,^)  +  d  =  0. 

One  can  easily  choose  the  scalar  q  so  that  y  B^y  >_  26^|[yjj  for  any  desired 

6-p  thus  eliminating  the  need  for  Step  2  of  the  algorithm. 

The  above  scheme  of  Gill  and  Murray  [9]  can  thus  be  used  in  conjunction 

of  our  algorithm.  If  the  Hessian  at  any  accumulation  point  of  the  method 

is  sufficiently  positive  definite,  this  method  reduces  to  Newton's  method, 

and  quadratic  convergence  is  assured. 

Constrained  Nonlinear  Programming 

Consider  the  following  nonlinear  programming  problem: 


NLP:  minimize  f (x) 


subject  to  g  (x)  <^0  j  =  l,...,m 


Recently,  a  great  deal  of  attention  has  been  given  by  many  authors  to  extending 


quasi-Newton  procedures  from  the  unconstrained  case  so  that  they  can  handle 
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problems  of  the  above  type.  For  a  review  of  these  methods,  the  reader  is 
referred  to  Garcia- Palomares  and  Mangasarian  [8],  Han  [13],  and  Powell  [20] 
A  typical  method  in  the  class  of  quasi-Newton  methods  proceeds  as 
follows.  Given  x^,  let  d^  be  an  optimal  solution  to  the  following  problem: 


D(x^):  minimize  Vf(x^)Cd  +  -j  d^^d 

subject  to  g_.  (x^)  +  Vg^Cx^)1^  _<  0 


J  1  5  •  •  • 


If  x,  is  sufficiently  close  to  a  Kuhn-Tucker  point  x  and  if  B  is  sufficiently 
K  K. 

close  to  the  Hessian  of  the  Lagrangian  at  x,  then  the  algorithm  x^^  =  x^  + 

converges  to  x  at  a  superlinear  rate. 

In  [12],  Han  was  able  to  prove  convergence  of  the  procedure  starting 

from  points  remote  from  x.  He  showed  that  if  y  is  sufficiently  large  so 

that  y  >  Uj  for  j  =  !,..., m,  where  is  the  Lagrangian  multiplier  associated 

with  the  jth  constraint  in  Problem  D(x^) ,  then  d^  is  indeed  a  descent  direc- 

m 

tion  for  the  penalty  function  $(x)  =  f (x)  +  y  £  max  {0,g.(x)}  at  x  .  He 

j-1  J 

was  able  to  show  global  convergence  by  letting  x^+^  =  xk  +  ^k^k’  ”^ere  ^k 
essentially  solves  the  problem  to  minimize  t+Kx^.  +  Xd^)  subject  to  0  <_  \  <_  6 , 
where  5  >  0  is  a  fixed  number. 

We  will  now  show  that  our  minimax  algorithm  specializes  to  Han’s  method 
and  extends  it  in  two  ways.  First,  rather  than  performing  a  line  search,  our 
procedure  uses  the  easily  implementable  Armijo’s  search.  In  [12],  Han  suggested| 
that  it  is  of  some  practical  value  to  devise  such  an  approximate  search  pro¬ 
cedure  for  the  nondif f erentiable  function  j).  Second,  a  typical  quasi-Newton 
method  could  stop  prematurely  if  Problem  D(x,^  has  an  empty  feasible  region, 

that  ^s,  if  there  exists  no  vector  p  such  that  Vg.(x,)tp  <  0  for  jel,  where 

J  k 
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I  =  { j :  gj (x^)  >  O}.  As  will  be  seen  shortly,  our  direction-finding  problem 
is  always  feasible,  and  furthermore  it  reduces  to  Problem  DCx^)  if  the  latter 
is  feasible. 

Note  that  Problem  NLP  can  be  put  in  the  minimax  format  as  follows. 

Let  £  =  o  and  let  a  (x)  =  y  max  {0,g^.(x)},  where  y  is  an  exact  penalty  para¬ 
meter.  Then  Problem  P  becomes: 


minimize  f(x)  +  y  £  max  (0,g.(x)} 


At  any  particular  iteration,  our  direction-finding  problem  reduces  to: 


''CV-- 


minimize  Vf 


(xk)td  +  m 

1=1 


)ect  to  g  (Xj^)  +  Vg^(xk)  d  £  y_. 


j  =  1,  . .  .  ,m 


>0 


j  1 ,  »  .  .  ,m 


The  relationship  between  problems  DCx^)  and  D'Cx^)  is  given  by  Lemma  5.1  below. 
Lemma  5 . 1 

If  Problem  DCx^)  is  not  feasible  then  any  feasible  point  (d,y)  to  Problem 

m 

D’ (x.  )  must  have  £  y.  >  0.  Now  suppose  that  b,  is  positive  semidefinite  and 
j=l  J  k 

symmetric.  Further  suppose  that  Problem  DCx^)  is  feasible  and  that  it  has 

(d  ,u)  as  a  Kunn-Tucker  solution.  If  y  >  u.  for  j  =  l,...,m,  then  (d,  ,y=0) 

«  J  fc 

is  an  optimal  solution  to  Problem  D'Cx^).  Further,  if  is  positive  definite, 
then  any  optimal  solution  (d,y)  to  Problem  D  (x^)  must  satisfy  y  =  0  and  d  =  d^. 

Proof 

Obviously,  if  Problem  ^(x^)  is  not  feasible  then  any  feasible  point  (d,y)  to 

m 

Problem  D' (x,  )  must  satisfy  [  y.  >  0.  Now  suppose  that  (d  ,u)  is  a  Kuhn-Tucker 
**  .  _ ,  J  k 


90 


solution  to  problem  D(x^).  Then: 


m 


Vf 


(V  +  Vk +  .2  "j'W  ‘ 0 

j=l  J 


Uj  +  Vgj  0^)  dk]  =  0  j  =  l,...,m 

gjO^)  +  Vgj^xk)tdk  1  0  j  =  l,...,m 


u.  >  0 
J  “ 


(5.1) 


j  —  1  j  ■  •  •  j  m 


But  (d,y)  is  a  Kuhn-Tucker  solution  to  Problem  D 1 (x  )  if  there  exists  a  vector 

K. 

v  such  that 


m 


Vf 


<Xk>  +  V  +  .i1VjV8J(xlc)  ‘  0 


U  -  v.  >  0 

J  ” 


Vj  +  V8j(xk)  d  ~  yj]  =  0 

«j(xk>  +  v«j(xk)ta  -  ^ 


v.  >  0 
J  “ 


(P  -  -  0 


j  =  1, . .  .  ,m 
j  =  1, . . . ,m  (5.2) 
j  ~  l,«.«,m 

j  1 9  •  •  •  >  m 
j  !)•••) m 


Noting  that  |i  >  u_.  ,  it  follows  that  the  system  defined  by  (5.2)  holds  by 

A  /A 

letting  d  =  d^  y  =  0,  and  v  =  u.  By  convexity  of  Problem  D' (x^)  it  follows 
that  (d  ,y=0)  is  indeed  an  optimal  solution. 

K 

A  A 

New  suppose  that  is  positive  definite  and  let  (d,y)  be  an  optional 

/V  A 

solution  to  Problem  D' (x^) .  Therefore  A(d,y)  +  (1-A)(dk>0)  is  also  an  optimal 


91 


solution  for  all  As  (0,1).  This  further  implies  that  ’i’(A)  defined  below  is 
constant  for  all  Ae(0,l): 


m 


'i'(X)  =  Vf(xk)tdk  +  AVfCx^d-d^  +  Ay  [  y. 


+  \  <  Vk  + 1  x2«-VtBk«-V 


+  x<d-v  Vk 


This  implies  that  'i' 1  (A)  =  0  for  Ae(0,l)  and  hence 


m 


Vf  (xk)t(d-dk)  -(-  (d-dk)CB  d  +  y  V  y.  + 

j=l  J 


(5-3) 


A(d-dk)CBk(d-dk)  =  0 


for  all  Ao(0,l) 


,  t 


But  this  is  possible  only  if  (d-dk)  B^d-d^  =  0,  and  since  Bk  is  positive 

A 

definite,  we  must  have  d  =  dk.  From  (5.3)  we  have  y  =  0  and  the  proof  is 
complete . 

The  above  lemma  shows  that  if  Bk  is  positive  definite  and  if  U  is  suffi- 

m 

ciently  large,  then  an  optimal  solution  to  Problem  D’ (x,  )  has  7  y.  >  0  only 

j-1  2 

if  Problem  DCx^  is  not  feasible.  To  illustrate,  consider  the  problem  to 

2 

minimize  f  (x)  subject  to  g(x)  0,  where  f(x)  =  (x-2)  and 


g(x) 


x  <  ! 
otherwise 


If  the  starting  solution  is  x^  =  1,  then  Problem  D(x^)  is  infeasible  and  the 
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quasi-Newton  method  would  stop  prematurely  at  the  infeasible  point  x^.  Our 
minimax  algorithm  will  not  stop  at  this  point  ans  would  eventually  converge 
to  the  optimal  solution  x  -  0.  It  is  thus  proposed  that  quasi-Newton  methods 
should  solve  Problem  D' (xR)  rather  than  Problem  D(*k)  in  order  to  find  a 
search  direction  d^. 
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AN  ALGORITHM  FOR  LINEARLY  CONSTRAINED 
NONLINEAR  PROGRAMMING  PROBLEMS 

•f  t1" 

Mokhtar  S.  Bazaraa  and  Jamie  J.  Goode 

In  this  paper  an  algorithm  for  solving  a  linearly  constrained  nonlinear 
programming  problem  is  developed.  Given  a  feasible  point,  a  correction  vector 
is  computed  by  solving  a  least  distance  programming  problem  over  a  polyhedral 
cone  defined  in  terms  of  the  gradients  of  the  "almost"  binding  constraints. 
Mukai's  approximate  scheme  for  computing  step  sizes  is  generalized  to  handle 
the  constraints.  This  scheme  provides  as  estimate  for  the  step  size  based  on 
a  quadratic  approximation  of  the  function.  This  estimate  is  used  in  conjunc¬ 
tion  with  Armijo  line  search  to  calculate  a  new  point.  It  is  shown  that  each 
accumulation  point  is  a  Kuhn-Tucker  point  to  a  slight  perturbation  of  the 
original  problem.  Furthermore,  under  suitable  second  order  optimality  condi¬ 
tions,  it  is  shown  that  eventually  only  one  trial  is  needed  to  compute  the 
step  size. 
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1.  Introduction 


This  paper  addresses  the  following  linearly  constrained  nonlinear  pro¬ 
gramming  problem: 

P:  minimize  f(x) 
subject  to  Ax  <_  b 

where  f  is  a  twice  continuously  differentiable  function  on  Rn,  and  A  is  an 
£xn  matrix  whose  jth  row  is  denoted  by  a^,  and  where  a  superscript  t  denotes 
the  transpose  operation. 

There  are  several  approaches  for  solving  this  problem.  The  first  one 
relies  on  partitioning  the  variables  into  basic,  nonbasic,  and  superbasic 
variables.  The  values  of  the  superbasic  and  basic  variables  are  modified 
while  the  nonbasic  variables  are  fixed  at  their  current  values.  Examples  of 
methods  ir-  this  class  are  the  convex  simplex  method  of  Zangwill  [18],  the 
reduced  gradient  method  of  Wolfe  [17],  the  method  of  Murtagh  and  Saunders  [12], 
and  the  variable  reduction  method  of  McCormick  [8], 

Another  class  of  methods  is  the  extension  of  quasi-Newton  algorithms  from 
unconstrained  to  constrained  optimization.  Here,  at  any  iteration,  a  set  of 
active  restrictions  is  identified,  and  then  a  modified  Newton  procedure  is 
used  to  minimize  the  objective  function  on  the  manifold  defined  by  these  active 
constraints.  See  for  example  Goldfarb  [6],  and  Gill  and  Murray  [5]. 

Other  approaches  for  solving  problems  with  linear  constraints  are  the 
gradient  projection  method  and  the  method  of  feasible  directions.  The  former 
computes  a  direction  by  projecting  the  negative  gradient  on  the  space  ortho¬ 
gonal  to  the  gradients  of  a  subset  of  the  binding  constraints  while  the  latter 
method  determines  a  search  direction  by  solving  a  linear  programming  problem. 
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For  a  review  of  these  methods  the  reader  may  refer  to  Rosen  [14],  Zoutendijk 
[19],  Frank  and  Wolfe  [4J,  and  Topkis  and  Veinott  [15]. 

In  this  paper,  an  algorithm  for  solving  problem  P  is  proposed.  At  each 
iteration  a  correction  vector  is  computed  by  finding  the  minimum  distance 
from  a  given  point  to  a  polyhedral  cone  defined  in  terms  of  the  gradients 
of  the  "almost"  binding  constraints.  An  approximate  line  search  procedure 
which  extends  those  of  Armijo  [1]  and  Mukai  [10,  11]  for  unconstrained  opti¬ 
mization  is  developed  for  determining  the  step  size.  First,  an  estimate  of 
the  step  size  based  on  a  quadratic  approximation  to  the  objective  function  is 
computed,  and  than  adjusted  if  necessary. 

In  Section  2,  we  outline  the  algorithm.  In  Section  3,  we  show  that 
accumulation  points  of  the  algorithm  are  Kuhn-Tucker  points  to  a  slight  per¬ 
turbation  of  the  original  problem.  Finally,  in  Section  4,  assuming  that  the 
algorithm  converges,  and  under  suitable  second  order  sufficiency  optimality 
conditions,  we  show  that  the  step  size  estimates  which  are  based  on  the  quad¬ 
ratic  approximation  are  acceptable  so  that  only  one  functional  evaluation  is 
eventually  needed  for  performing  the  line  search. 

2.  Statement  of  the  Algorithm 
Consider  the  following  algorithm  for  solving  Problem  P. 

Step  0 

Choose  values  for  the  parameters  c,  z,  6,  and  e.  Select  a  point  Xq  such  that 
AXq  <_  b  and  let  <5^  =  6.  Let  i  =  0  and  go  to  Step  1. 

Step  1 

Let  w^  be  the  optimal  solution  to  Problem  D(x^)  given  below: 
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tit 

D(x^):  minimize  Vf(x^)  w  +  -j  z  w  w 

subject  to  a^w  <  0  for  jcl(x^) 

where 

I(xi)  =  {j:  a^  >  -  c}  (2.1) 

If  =  0,  stop.  Else,  go  to  Step  2. 

Step  2 
Let 

I+(w^)  =  {j:  a^w^  >  0j  (2.2) 

and  let 

,  t 

b.  -  a  x 

6i  =  min  {l,  — J —  for  jel  (w±)}  (2.3) 

a.w. 

3  i 

Let 


and  go  to  Step  3. 

Step  3 
If 

f  (x,+ed.)  +  f(x.-ed.)  -  2f  (x  )  >  e25.  jl  d .[{ 2 

i  i  ii  1  —  i  "  i" 

iec 

-  e2Vf(x.)td. 

\  x  x 

i  f(x^+£d^)  +  f(x^-Ed^)  -  2f(x^) 


(2.4) 


(2.5) 


(2.6) 
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and  let  <5^^  =  and  go  to  Step  4.  Otherwise,  let  A_.  =  1,  =  Y  ^i’ 

and  go  to  Step  4. 

Step  4 
Let 


ct^  =  min  {l,A  }  (2.7) 

and  compute  the  smallest  nonnegative  integer  k  satisfying 

fCx^^Vd.)  -  f(x.)  1  j(|)kciiVf(x.)tdi  (2.8) 

1  kj 

Let  k.  =  k,  x.,,  =  x.  +  a.  Or)  3d.,  i  =  i  +  1,  and  go  to  Step  1. 
l  l+l  i  l  2  i  ° 

The  following  remarks  are  helpful  in  interpreting  the  above  algorithm. 


1.  A  direction  is  determined  by  solving  Problem  D(x^).  This  problem 

finds  the  point  in  the  convex  polyhedral  cone  {w:  a*Tw  0  for  j£l(x^)} 

which  is  closest  to  the  vector  -  —  Vf(x.).  Methods  of  least  distance  pro- 

z  i 

gramming,  as  in  the  works  of  Bazaraa  and  Goode  [2],  and  Wolfe  [16]  can  be 
used  for  solving  this  problem.  Special  methods  that  take  advantage  of  the 
structure  of  the  cone  constraints  may  prove  quite  useful  in  this  regard. 


2.  The  restrictions  enforced  in  Problem  D(x^)  are  the  c-binding  constraints 

at  x.,  that  is,  those  satisfying  b.  -  c  <  a^x.  <  b..  If  w.  =  0,  then  the 
i  j  j  i  —  j  i 

algor"' ,'hm  is  terminated  with  x^.  In  this  case,  from  the  Kuhn-Tucker  condi¬ 


tions  for  Problem  D(::  ),  there  cxis,‘  ” .  for  "irT(x.)  such  that: 

1  J  " 
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Vf(x, )  +  J  u.a.  =  0 
1  jelG.p  J  3 

u.  >  0  for  iel(x.) 

j  —  J  i  . 

These  conditions  imply  that  x^  is  a  Kuhn-Tucker  point  for  the  following 
problem: 


minimize  f  (x) 


subject  to 


t  „  t 
a,x  <  a.x. 
j  -  J  - 


a*x  <  b. 
j  “  J 


for  jeKx^ 
for  jil(xi) 


Noting  that  b^.  -  c  <  a^x^^  £  b_.  for  jelCxJ,  if  c  is  sufficiently  small,  it 
is  clear  that  the  algorithm  is  terminated  if  is  a  Kuhn-Tucker  solution  to 
a  slightly  perturbed  version  of  Problem  P.  The  following  definition  will 
thus  be  useful. 


Definition  2.1 

Let  x  be  a  feasible  point  to  Problem  P.  If  the  optimal  solution  to  Problem 
D(x*)  is  equal  to  zero,  then  x  is  called  a  c-KT  solution  to  Problem  P. 

3.  If  x,  +  w.  is  feasible  to  Problem  P,  then  the  search  vector  d.  is  taken 
i  i  1 

as  w,.  Otherwise,  d.  is  taken  to  be  the  vector  of  maximum  length  along  w. 
i  i  i 

which  maintains  feasibility  of  x^  +  d^. 
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4.  Steps  3  and  4  of  the  algorithm  compute  the  step  size  taken  along  the 
vector  d.  in  order  to  form  x^+^.  As  proposed  by  Mukai  [10,  11],  first  an 
estimate  of  the  step  size  A_^  is  calculated.  When  appropriate,  is  computed 
by  utilizing  a  quadratic  approximation  of  the  function  f  at  x^,  otherwise  A^ 
is  taken  equal  to  1.  In  order  to  ensure  feasibility  to  Problem  P,  the  first 
trial  step  size  used  in  conjunction  with  Armijo  line  search  [1] ,  is  the 
minimum  of  A^  and  1.  As  will  be  shown  in  Section  4,  under  suitable  assump¬ 
tions,  for  large  i,  test  (2.5)  passes,  k_^  =  0,  and  =  A^  <  1.  This  confirms 
efficiency  of  the  line  search  scheme  where  eventually  only  one  trial  is 
needed  to  compute  the  step  size. 


3.  Accumulation  Points  of  the  Algorithm 
Theorem  3.1  shows  that  each  accumulation  point  of  the  proposed  algorithm  is 
a  c-KT  point.  In  order  to  prove  this  theorem,  lemmas  3.1  and  3.2  are  needed. 
These  two  lemmas  extend  similar  results  of  Mukai  [10]  for  unconstrained  problems 
In  order  to  facilitate  the  development  in  this  section,  the  following 
notation  is  used.  Let  w(x)  be  the  optimal  solution  to  Problem  D(x)  and  let 
8(x)  be  as  given  in  (2.3)  with  x^  replaced  with  x.  Finally,  let  d(x)  =  8(x)w(x) 


Lemma  3.1 

* 

Suppose  that  x  is  not  a  c-KT  point  for  Problem  P.  Then,  there  exist  scalars 
y  and  s  >  0  so  that  y  <_  a(x)  1  for  each  x  with  j|x-x  [|  <  s. 


Proof 

"ft  'M 

There  exists  s^  ">  0  so  that  I(x)  =  l(x  )  for  all  jjx-x  ||  <  s^.  Thus,  the 
feasible  region  for  Problem  D(x)  is  equal  to  that  of  Problem  D(x  )  for  all  x 
satisfying  |Jx-x  j|  <  s^.  By  continuous  differentiability  of  f,  it  then  follows 
that  w(*)  is  continuous  in  x  at  x  ,  see  for  example  Daniel  [3].  Particularly, 
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there  exists  a  number  >  0  such  that  I  (wGO)  38  I  Cw(x  ))  if  ||x~;x  (|  <  s 

This  together  with  the  continuity  of  vC*)  and  the  formula  for  computing  B(*) 

* 

imply  that  8C*)  is  continuous  in  x  at  x  .  Hence,  d(*)  is  also  continuous. 

*  *  t  * 

Since  x  is  not  a  c-KT  point,  then  w(x  )  r  0.  Furthermore,  -  a^x  >  c  if 

a^w  >  0  which  implies  that  86c  }  >  0.  Therefore  d(x  )  j*  0.  By  continuity 

of  BC*)  and  dC*)  at  x  there  exist  scalars  q  and  s  >  0  so  that 


6(x)|l  dCx)  !! 2  _>  4  8Cx*)l|d(x*)|!  2 


f  Cx+ed(x))  +  f(x-£d(x))  -  2f(x)  <  q  if  ||x-x  ||  <  s 


if  || x-x* ||  <  s 

(3.1) 

if  || x-x  ||  <  s 

(3.2) 

Now,  let  x  be  such  that  [|x-x*  [|  <  s.  Since  w(x)  solves  Problem  DCx),  then 
Vf(x)tw(x)  t.~\z  l!w60|j2.  This,  in  turn,  implies  that  -  Vf(x)td(x)  'i_  \  z 
860  ||d(x)  |]2  and  from  C3.1)  we  get: 

-  Vf(x)tdCx)  ^  z  860  ||  d(x*)||2  =  y  >  0  (3.3) 


If  test  (2.5)  passes,  then  from  (3.2)  and  (3.3)  the  following  lower  bound  on 
X^  is  at  hand: 

x  _ _ -  _ >  Ljl 

i  f(x+€d(x))  +  f(x-£d(x)  -  2f(x)  —  q 


If  test  (2.5)  fails,  then  X.  =  1  and  hence  X.  >  min 

i  i  — 

=  min  {l,  X^},  the  desired  result  follows. 


S  inc  e 


Lemma  3 . 2 

If  x*  is  not  a  c-KT  point  for  Problem  P,  then  there  exist  a  number  s  >  0  and 
an  integer  m  so  that  k(x)  <_  m  if  ||  x-x  j|  <  s,  where  k(x)  is  the  Armijo  integer 


■ 
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given  by  (2.8)  with  x^  and  cu  replaced  with  x  and  a(x)  respectively. 

Proof 

As  in  the  proof  of  Lemma  3.1  and  by  continuous  differentiability  of  f,  there 
exist  scalars  s,  h,  and  y  >  0  so  that  for  ||  x-x  ]|  <  s  the  following  hold: 

Vf(x)Cd(x)  <  -y  (3.4) 

|Vf (x+gd(x) ) fcd (x)  -  Vf(x)td(x)|  <  |  y  for  each  g£[0,h]  (3.5) 

Now  let  m  be  the  smallest  nonnegativa  integer  so  that  (i-)m  £  h  and  let  x  be 
such  that  1 1 x-x  ||  <  s.  Then  there  exists  9e[0,l]  such  that: 

f(x+(y)ma(x)d(x))  -  f(x)  -  i(|)ma(x)Vf(x)td(x) 

=  (■~)'ma(x)Vf(x+e(|)nia(x)d(x))td(x)  -  y(|)ma(x)Vf  (x)Cd(x) 

=  (|)ma(x)  {Vf(x+e(|)lna(x)d(x))td(x)  -  Vf  (x)td(x)  }  +  |  Vf  (x)Cd(x)  (3.6) 

Since  0(-|-)mct(x)  jc  h,  (3.4)  and  (3.5)  imply  that  the  right  hand  side  of  (3.6) 
is  0  which  in  turn  shows  that  k(x)  <_  m,  and  the  proof  is  complete. 

Theorem  3.1 

Either  the  algorithm  terminates  with  a  c-KT  point  for  Problem  P  or  else  gen¬ 
erates  an  infinite  sequence  {x^}  of  which  any  accumulation  point  is  a  c-KT 


point  for  Problem  P. 
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Proof 

Clearly  the  algorithm  stops  at  x.  only  if  x.  is  a  c-KT  point.  Now,  suppose 

that  the  algorithm  generates-  the  infinite  sequence  {x  }.  Suppose  that  x  is 

K  * 

an  accumulation  point  so  that  x^  — ►  x  for  some  infinite  set  K  of  positive 

K  * 

integers.  Since  f(x^}  is  decreasing  monotonically  and  since  f (x^)  — *■  f (x  ) 

it 

then  f(x^)  — *•  f(x  )•  Suppose  by  contradiction  to  the  desired  conclusion  that 

* 

x  is  not  a  c-KT  point.  From  Lemmas  3.1  and  3.2,  there  exist  positive  numbers 
y  and  y  and  an  integer  m  so  that  a ^  >_  y,  Vf  (x^)td^  £  -  y,  and  k_  _<  m  for  large 
i  in  K.  Therefore, 

f(xi+i)  -  f(xi)  £  -|(j)  1  ui  Vf(xi)tdi  £  -  y  yy(-~)m 

for  large  i  in  K.  This  implies  that  fCx^)  — *•  -°°,  contradicting  the  fact  that 
f(xj  — ►  f (x  ).  This  completes  Che  proof. 

4.  Eventual  Acceptance  of  the  Step  Size  Estimate 

* 

In  the  previous  section,  we  showed  that  an  accumulation  point  x  of  the 
sequence  {x^}  generated  fay  the  algorithm  is  a  KT  point  to  the  perturbed  pro¬ 
blem  P'  given  below: 


P’:  minimize 

subject  to 


f(x) 

t  .  t  * 

V  5  V 


V  i  bj 


for  jei(x*) 
for  j^lCx*) 


Here,  we  assume  that  the  whole  sequence  {x^}  converges  to  a  point  x  which 
satisfies  suitable  second  order  sufficiency  conditions.  Under  this  assump¬ 
tion,  we  show  that  test  (2.5)  is  eventually  passed.  Furthermore,  we  show 
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that  X,  <  1  and  that  =  0  for  i  large  enough. 

The  second  order  condition  is  given  in  Definition  4  . 1 .  It  is  well-known 

: k 

that  x  satisfying  this  condition  is  a  strong  local  minimum  for  problem  P' . 

A 

That  is,  there  exists  a  number  y  >  0  so  that  f (x  )  <  f(jx)  if  x  is  feasible 
to  problem  P’  and  jjx-x*]J  <  y,  see  for  example  McCormick  [9]  and  Han  and 
Mangasarian  [7]. 

Definition  4.1 

*  *  *  r  t  *  1  * 

Let  x  be  such  that  Ax  <  b  and  let  I(x  )  =  H:  a.x  >  b.  -  cr.  x  is  said 

-  J  J  ‘ 

to  satisfy  the  second  order  sufficiency  optimality  conditions  for  problem  P' 

■k 

if  there  exist  scalars  u^  _>  0  for  jtl(x  )  and  y  >  0  so  that: 

Vf  (x  )  +  £  Au.a.=0 

jeKx  )  J  J 

f (x  )td  _<  0,  a^d  _<  0  for  jel(x  ),  ]|  d  ||  =  1  =>  dtH(x  )d  >  y  (4.1) 

Theorem  4.1  shows  that  test  (2.5)  will  eventually  be  passed  so  that 
X.  is  given  by  (2.6).  The  following  two  intermediate  results  are  needed  to 
prove  this  theorem. 

Lemma  4 . 1 

If  Cd  0  and  ||  d  jj  =  1  imply  that  dCHd  _>  y  >  0  then  there  is  a  number  0  >  0 
so  that  Cd  <_  91  and  ||  d  |)  =  1  imply  that  dCHd  >_  y/2. 

Proof 

Suppose  by  contradiction  that  for  each  integer  k  there  is  a  vector  d^  such 


that 
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II  dk  II  =  1.  Cdk  <  J  1,  and  d^Hdk  <  y/2  (4.2) 

Since  the  sequence  {d^}  is  bounded,  it  has  an  accumulation  point  d.  From 
(4.2),  1 1 d  |j  =1,  Cd  <_  0,  and  dtHd  <  y/2  which  contradicts  the  assumption  of 
the  lemma. 

Lemma  4 . 2 

If  either  {x^}  converges  or  {x:  Ax  b,  f  (x)  f  (x^) }  is  bounded,  then 
d.  -*  0. 

Proof 

Since  0  <  8^  <_  1  and  d^  =  B^w^,  it  suffices  to  prove  that  •+•  0.  Suppose 
there  exist  an  infinite  set  of  positive  integers  K  and  a  number  e  >  0  so 
that 

||wi  |]^_  £  for  i eK  (4.3) 

Clearly,  under  either  of  the  assumptions  of  the  lemma,  there  exist  an  infinite  i 

*  K T  *  * 

K  C  .  and  a  point  x  so  that  x  -*■  x  .  By  Theorem  3.1  x  is  a  c-KT  point 

*  * 

for  Problem  P.  Thus,  w  =  0  is  the  unique  optimal  solution  to  Problem  D(x  ). 

But  for  large  ieK' ,  I(x  )  =  I(x^),  and  by  continuity  of  the  solutions  to 
D(*)  we  must  have  ||  w  |]  <  e/2  for  large  i  in  K'.  This  contradicts  (4.3)  and 
the  proof  is  complete. 

Throughout  the  remainder  of  this  section,  the  following  notation  will 


be  used  for  any  scalar  y: 
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hT  =  2  /  (1-y)  K(x .+yyd .  )dy 
1  0  1  1 


(4.3) 


We  can  integrate  by  parts  to  obtain 


-  fGO  =  yVf(xi)tdi  +  ~  y2d^H^di 


(4. A) 


For  further  details,  the  reader  may  refer  to  Polak  [13,  p.  293]. 


Theorem  4.1 

r  1  * 

Let  {x  f  be  a  sequence  generated  by  the  algorithm.  Suppose  that  x^  -*■  x  and 
* 

x  satisfies  the  second  order  optimaxity  conditions  for  problem  P'.  Then 
there  exists  an  integer  m  so  that  test  (2.5)  passes  for  all  i  >  m. 


Proof 


From  (A. 3)  and  (A. A)  we  get: 


fOc.+cd.)  -  f(x.)  -  eVf  (x.)td.  +ie2dtHed. 
1  I  i  ii2  ixi 


f  (x.-ed.)  -  f(x.)  =  -eVf(x.)td.  +  ^e2dtn~ed. 

ii  i  i  i  2  i  i  i 


Adding  we  obtain: 


f(xi+edi)  +  f(xi-edi)  -  2f(Xi>  =  ~  e2d^(H^+H'£)dj, 


(A. 5) 


*  t  *  * 

Now  for  jel(x  ),  a^x  >  b^  -  c.  Since  x^  x  then  for  i  large  enough, 

a%.  >  b .  -  c  so  that  1el(x.),  By  step  1  of  the  algorithm  a%.  <  0  and  so 
J  i  ,  J  i  j  i  — 

t  di  * 

a,  : - 77  <  0  for  i  large  enough  and  jelCx  ).  Likewise,  from  step  1  of  the 

J  II  di  ||  ~ 


1US 


d. 


algorithm  Vf(x.)tw.  <  0  and  hence  Vf  (x )  t-T'r — 1  0-  Since  xt  -  x  »  then  for 

l  l  —  j  1  I  ai  1 


d  i  ■ 

any  number  8  >  0,  Vf(x*)C  — d^jj  -  9  for  1  large  enough.  Thus,  Lemma  4.1  and  | 


the  second  order  conditions  imply  that 


d*H(x*)d;j.  >_  1  ||d.  j)Z  for  large  i 


(4.6) 


Now  note  that 


|!h®.-  HOc  ) 


[2  /  a-y) 

0 


HCxi+yedi)-H(x  )  |dy  |j 


)  < 


(4.7) 


Since  x±  -*■  x*,  then  by  Lemma  4.2,  di  -*•  0.  Particularly,  for  i  large  enough, 
||H(x  +yed  )-H(x*)  II  <  ^  for  all  ye[0,l].  From  (4.7),  ||H^  -  H(x  )  ||  <  ^  • 
This  together  with  (4.6)  yields: 


dWr  -  d^H(x*)d.  +  dt(H£-H(x*))d, 
iii  x  x  li  i 

-2  l!di  1! 2  -  lidi  I! 2  I'h^-h(x*)  || 

>  ||  d .  || 2  for  large  i  (4.8) 

—  4  "  l 


Similarly, 

dV^d.  >  y-  |  j  d  .  ||  2  for  large  i 
i  x  x  —  4  x 

From  (4.5),  (4.8),  and  (4.9)  it  immediately  follows  that 


(4.9) 
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fCx^edJ  +  fCx^-sdJ  -  2f(xi)  _>  e2  ^'!idi  II  2 


for  large  i  (4.10) 


From  („4.10),  if  test  (2.5)  fails  for  a  large  i,  we  must  have: 


2  Y  ii  .  ii  2 


e  6.  Hd.  ||  >  f (x.+ed.)  +  f(x.-ed.)  -  2f(x.)  ■>  e  j||d  || 

x  "  l  "  ii  ii  i  —  4  l 


y 

that  is,  6 .  >  t  •  If  the  conclusion  of  the  lemma  does  not  hold,  then  test 
i  4 

Y 

(2.5)  fails  infinitely  often  and  then  <$^  0.  This  contradicts  0^  >  for 

large  i,  and  the  proof  is  complete. 


Theorem  4.2 


and 


* 

Let  {x.}  be  a  sequence  generated  by  the  algorithm.  Suppose  that  x^  -►  x 

'fc 

that  x  satisfies  the  second  order  optimality  conditions  for  Problem  P'.  Then 
there  exists  an  integer  m  so  that  f(x^+a  d^)  -  f(x^)  £  a^Vf(x^)  td^  for  a)l 
i  >  m,  th:-.t  is,  k.  =  0  for  all  i  >  m. 

—  i  — 


Proof 

By  Theorem  (4.1),  test  (2.5)  passes  for  large  i  so  that  is  given  by 


-  e^7t(xi)cd^ 


-  Vf(xJ  d. 


1  f(xi+ed.)  +  f(x.-£d.)  -  2f(x.)  Y  di(Hi+Hi£)di 


(4.11) 


If  X  <  1  so  that  =  X then  from  (4.4)  and  (4.11)  we  get: 


f(VVd  '  £feP  -  J  ar£(*i)tdi '  I  xldiHildi  +  !  hM(*i)tdi 


•  1  xltdtiHildi  - 1  -  h 


(4.12) 
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*  i  £  —  F 

Since  x  -*■  x  ,  then  by  Lemma  4.2,  -*•  0.  Thus  H_^  ,  and  H^'  converge  to 

*  v  2  1 1  1 1 2 

H(x  )  and  the  first  term  in  (4.12)  will  be  less  than  ^  ^  !!d^||  for  i  large 

enough.  As  in  the  proof  of  Theorem  4.1,  d*?  (H^+H^e)d^  >_  [{ d^||  ^  for  large  i. 

Substituting  in  (4.12),  the  desired  result  holds. 

Now  suppose  that  A^  >  1  so  that  ct^  =  1.  Then 


f(x.+a.d.)  -  f(x.)  -  4  a.Vf(x.)td.  =  ^  d^d .  +|vf(x,)td,  (4.13) 

ill  i  3  i  l  i  2iii  3  i  i 


Since  A^  >  1,  then  from  (4.11)  we  must  have 


V£(x1)tdi  <  -  §  dJcH^bd. 


Substituting  in  (4.13)  we  get: 


f  (x  .+& .  d  . )  -  f(x.)  -  k  ct.Vf(x.)td.  <  ~  [d^d.  -  ~  dt(H£+HT£)d.] 
ill7  i  3i  i  i  2  i  i  l  2iii  i 


_1_ 

12 


dt(H£+H.G)d. 

ill  i 


(4.14) 


That  the  right  hand  side  of  (4.14)  is  £  0  for  large  i  follows  exactly  in  the 
same  manner  in  which  we  proved  that  (4.12)  is  <  0.  This  completes  the  proof. 


Finally,  we  state  certain  conditions  in  Theorem  4.3  below  which  guarantee 
that  A^  <  1  so  that  for  i  large  enough. 

Theorem  4.3 

Let  be  a  sequence  generated  by  the  algorithm.  Suppose  that  x^  -*■  x  and 

* 

that  x  satisfies  the  second  order  optirsality  conditions  for  Problem  P'.  If 

y 

z  <  ,  then  there  is  an  integer  m  so  that  A^  <  1  for  all  i  >  ra,  that  is, 

a.  -  A.  for  all  i  >  m. 
i  l  — 


Ill 


Proof 

By  Theorem  4.1  there  is  an  integer  m  so  that  tor  i  >  to  we  have: 


X.  = 


-  e2Vf(x.)td. 


i  i 


-  vfo.rcK 


i  fCx.+ed^  +  f(xred.)  -  2f(x.)  1  dt(H£+H-e)d 

2  i  i  i  i 


(4.15) 


As  in  the  proof  of  Theorem  4.1 


4dCCH>H.£)d.  >  ^-||d.j|^  for  i  large  enough 
2  l  i  l  l  —  4  "  i" 


(4.16) 


Since  solves  Problem  D(x^),  then  there  exist  scalars  u^.  >_  0  for  jel(x^) 
such  that 


V  f  (x .  )  +  zw .  +  T  u  .  .  a .  =  0 

1  1  4l(x.)  1J  J 


(4.17) 


u .  .a  ,w .  =  0 
30  Ji 


for  jeKxJ 


(4.18) 


From  (4.17)  and  (4.18)  it  follows  that  Vf(xi)tw^  =  -  z  i  1  |  ^ .  But  by  Theorem  3 

k  k  k 

x  is  a  c-KT  point  and  hence  the  optimal  solution  w  to  Problem  D(x  )  is 

*  * 

w  =  0.  Since  x_^  -*■  x  ,  by  continuity  of  the  optimal  solution  to  Problem  D(’), 
and  since  b^  -  ajx^  >  c  for  each  j£l+  (w^) ,  it  follows  from  (2.3)  that  =  1 

for  large  i.  Thus  d.  =  w.  so  that 

ii 


Vf(x^)td^  =  -  z  ) ] d J | ^  for  large  i 


(4.19) 


Substituting  (4.19)  and  (4.16)  in  (4.15),  it  is  clear  that  X^  <  1  for  i  large 
enough,  and  the  proof  is  complete. 
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ABSTRACT 

Optimality  conditions  for  families  of  nonlinear  programming 
problems  in  Rn  are  studied  from  a  generic  point  of  view.  The  ob¬ 
jective  function  and  some  of  the  constraints  are  assumed  to  depend 
on  a  parameter,  while  others  are  held  fixed.  Under  suitable  con¬ 
ditions,  certain  strong  second-order '  conditions  are  shown  to  be 
necessary  for  optimality  except  possibly  for  parameter  values  lying 
in  a  negligible  set. 
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I.  Introduction. 


For  families  of  nonlinear  programming  problems  of  the  type 


min  f(x,p)  in  x  subject  to  g(x,p)  <  0,  h(x,p)  =  0,  and 
x  e  C 


we  derive  optimality  conditions  which  are  generically  necessary 

in  the  sense  that  they  hold  at  all  local  minimizers  for  (Q  ) ,  un- 

P 

less  p  belongs  to  a  certain  first  category  set  of  measure  zero. 

Here,  P  is  an  open  subset  of  Euclidean  space  (or  more  generally  a 
manifold),  f,  g,  and  h  map  Rnxp  into  R,  R1 ,  and  RJ,  respectively, 

I  and  J  being  finite  sets,  and  the  inequality  g(x,p)  <  0  [resp. , 
the  equality  h(x,p)  =  0]  is  interpreted  coordinatewise. 

In  Spingarn  and  Rockafellar  [7] ,  such  conditions  for  one  spe¬ 
cific  class  (Qp)  were  derived:  right-hand-side  perturbations  of  the 
constraints  and  linear  perturbations  of  the  objective  function.  For 
that  class  it  was  demonstrated  that,  except  possibly  for  problems 
(Qp)  for  p  in  a  set  of  measure  zero,  the  "strong  second-order 
conditions"  (the  Kuhn-Tucker  conditions  with  strict  comple¬ 
mentary  slackness,  linear  independence  of  the  active  constraint 
gradients,  and  positive  definiteness  of  the  Hessian  of  the 
Lagrangian  on  the  subspace  perpendicular  to  the  gradients  of  the 
active  constraints)  hold  at  every  local  minimizer  for  (Qp)  • 

When  studying  questions  of  genericity,  the  class  of  problems 
to  which  the  results  apply  is  crucial.  The  classes  of  problems 
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considered  in  this  paper  are  more  general  than  in  [71  in  two  ways. 
First,  the  manner  in  which  f,  g,  and  h  depend  on  p  is  given  more 
freedom.  Rather  than  requiring  perturbations  of  a  special  (e.g. 
right-hand-side)  type,  we  will  only  require  that  the  family  of 
problems  satisfy  a  general  and  easily  verifiable  criterion.  Sec¬ 
ond,  in  addition  to  the  constraints  g  <  0  and  h  =  0,  which  we  re¬ 
fer  to  as  the  "variable"  constraints,  we  also  investigate  the  ef¬ 
fect  of  the  "structural"  of  "fixed"  constraint  x  e  C  that  does  not 
vary  with  p.  The  distinction  between  these  two  types  of  con¬ 
straints  is  important  here  because  the  two  types  play  different 
roles  both  in  the  analysis  of  the  conditions  and  in  the  statement 
of  the  conditions  themselves:  the  conditions  that  turn  out  to  be 
generically  necessary  for  optimality  depend  on  the  particular 
class  of  problems  under  consideration. 

The  regularity  conditions  that  we  impose-  on  the  set  C  have 
been  incorporated  into  our  definition  of  "cyrtohedron" .  Cyrto- 
hedra,  which  were  introduced  in  [5] ,  are  piecewise  smooth  sets 
that  can  be  represented  locally  by  a  finite  number  of  nonlinear 
inequality  and  equality  constraints.  They  are  similar  to,  but 
more  general  than  the  "manifolds  -  with  -  corners"  studied  by 
Schecter  [4] . 

The  idea  to  study  mathematical  programming  problems  from  the 
generic  point  of  view  goes  back  to  the  Saigal  and  Simon  study  [3] 
of  the  complementarity  problem.  Several  others  have  studied  ques¬ 
tions  which  arise  in  economics  concerning  the  generic  properties 
of  equilibrium  models  and  Pareto  optima.  The  dominant  notion  of 
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a  "generic"  property  in  all  of  these  studies  has  been  the  category 
theoretic  one,  relative  to  spaces  of  differentiable  mappings  under 
the  Whitney  topology,  rather  than  the  "measure  zero"  notion  used 
here,  and  which  we  feel  is  better  suited  for  studying  nonlinear 
programming  problems. 
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II.  Preliminaries  and  notation. 

n  s 

A  set  McR  is  a  k-dimensional  C  submanifold  (s  >  1)  if  for 

}c  s 

each  x  e  M  there  is  an  open  set  U  c  R  and  a  C  dif feomorphism  $> 
mapping  U  onto  a  neighborhood  of  x  in  M  [2]  .  For  any  x  =  <?  (q)  e  M, 

=  range  d$  (q)  is  the  tangent  space  to  M  at  x.  If  f  :  Rn  •>  R, 

then  "f |M"  denotes  the  restriction  of  f  to  M.  For  any  x  £  Rn, 

"Vf(x)"  denotes  the  ordinary  gradient  of  f  at  x,  while  "V(f|M) (x) " 

denotes  the  gradient  of  f|M  at  x,  the  latter  being  a  linear  func¬ 
tion  on  Mx  .  If  V(f  |  M)  (x)  =  0  (i.e.  ,  if  Vf'(x)  is  perpendicular  to 
M_.)  ,  then  x  is  a  critical  point  for  f  on  M,  and  in  this  case  the 
Hessian  for  f|M  at  x  =  ‘Mq)  is  the  bilinear  function  on  M..  defined 

X 

by 


(V2(f |M)  (X) )  (u,v)  =  (V2  (f  »$)  (q) )  (u,v) 

-  -  2 

where  u  =  d$(x)u,  v  =  d$(x)v,  and  V  (f°<S>)  (q)  is  the  ordinary 

2 

Hessian  of  fo<i>.  If  V  (f°<i>)  (q)  is  nonsingular,  then  x  is  a  nonde¬ 
generate  critical  point  [1] . 

A  subset  S  c  Rn  is  of  measure  zero  provided  for  every  e  >  0, 

S  can  be  covered  by  a  countable  family  of  n-rectangles ,  the  sum  of 
whose  measures  is  less  than  e  [1] .  S  c  Rn  is  of  f irst  category 
provided  S  is  a  countable  union  of  sets  whose  closures  have  empty 
interior.  We  will  call  S  a  negligible  set  if  S  is  both  of  measure 
zero  and  first  category. 
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If  F,  N,  S  are  submanifolds,  S  c  n,  f  :  F  -*■  N ,  then  f  :  F  N 

is  transverse  to  S  if  N  =  S  +  df  (x)  (F  )  whenever  y  =  f (x)  e  S. 

-  y  y  x 

For  a  proof  of  the  following,  consult  Hirsch  [1] : 

(2.1)  THEOREM  (Pamametric  Transversality )  Let  F,  S,  N  be  CS  sub¬ 
manifolds  ,  P  open ,  with  S  c  n  ,  <f>  :  F  x  p  +  n  of  class  CS , 

s  >  max{0,  dim  F  +  dim  S  -  dim  N},  and  let  <p  be  transverse  to  S. 
Then  there  is  a  subset  P 1  c p  such  that  P \ P '  is  negligible  and  for 
all  p  c  P'  ,  <p  ( •  , p)  :  F  -*■  N  is  transverse  to  S . 

(2.2)  COROLLARY.  Let  f  :  Fxp  Rbe  C2,  P  open,  F  c  Rn  a  C2  sub¬ 
manifold  ,  and  assume  for  each  x  £  F  that  the  Jacobian  of  the  func- 

tion  p  ►—*  V..f  (x ,  • )  is  of  rank  n  at  all  p£P.  Then  except  for  p  in 

X 

a  negligible  set,  all  critical  points  of  f(*,p)  on  F  are  nonde¬ 
generate  . 

Proof ;  Let  TF=  {  (x,  £ )  £  Rn  *  Rn  :  x  e  F,  (  e  Fx),  <p  (x,p)  = 

(x,V  f  (x ,  p)  )  .  For  each  pcP,  4>(*,p)  is  transverse  to  F«  (0}  if, 

X 

and  only  if,  all  the  critical  points  of  f(*,p)  on  F  are  nondegen¬ 
erate.  But  the  hypothesis  implies  that  <J)(x,*)  is  transverse  to 
F*{0}  for  each  X£F,  and  hence  that  <+>(*,■)  is  transverse  to 
F  x  {0}.  We  then  apply  the  theorem  with  s  =  1,  N  =  TF,  and  S  = 

F  x  {0}.  □ 

(2.3)  COROLLARY.  Let  F,  S,  N  be  C1  submanifolds ,  P  open ,  ScN, 

4>  :  F  x  p  -+  n  of  class  C1,  dim  F  +  dim  S  -  dim  N  <  0,  and  let  <p  be 
transverse  to  S.  Then  there  is  a  subset  P'  c  p  such  that  P\P'  is 
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Proof ;  It  follows  from  the  fact  that  if  <J>(»,p)  is  transverse 
to  S,  then  the  dimension  requirements  force  f ( x ,  p )  i  S  for  all 
x  c  F.  □ 

For  any  S  c  Rn,  "rank  S”  denotes  the  dimension  of  the  linear 

subspace  "span  S"  spanned  by  S.  "relint  S"  is  the  interior  of  S 

relative  to  the  affine  flat  spanned  by  S. 

Let  U  c  Rn  be  an  open  set,  G  ,  ct  e  A  and  H„,  Be  B,  finite  col- 

r  a  B 

lections  of  differentiable  functions  on  U.  For  any  Aq c  A  and 
x  e  U,  define 

F(x,Aa)  =  {VG  (x)  :  a  e  An }  u  { VH  (x )  :  BeB} 

0  a  OB 

Z(A0)  =  {y  e  U  ;  0  =  G^  (y)  =  Hg(y)  Va  £  AQ ,  V  P>  e  B  }  . 

n  s 

A  nonempty  connected  set  C  c  R  is  a  cyrtohedron  of  class  C  (s  >  1) 
—  s 

if  for  every  x  e  C,  there  are  finitely  many  C  functions  G^,  a  e  A, 

and  Hc,  BeB,  defined  on  a  neighborhood  U  c  Rn  of  x  such  that 
p 

x  e  Z  (A)  and 

(2.4)  (a)  For  all  xeU,  xeC  if,  and  only  if , 

G  (x)  <  0  V  a e A  and  Hft (x)  =  0  V  BeB. 

(b)  If  La  (x)  +  Lb0VH.  (x)  =  0  for  some  a  e  and 

LA  a  a  LB  B  B  + 

g 

b  e  R  ,  then  a  =  0  and  b  =  0. 

(c)  For  each  Aq  c a  there  is  an  integer  s(AQ)  such  that 
rank  T(x,Aq)  =  s(Aq)  for  all  xeU. 
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If  C  is  a  cyrtonedron,  then  U  may  always  be  chosen  [5]  so  that 

(b1)  For  all  x  e  U,  (b)  holds  with  x  in  place  of  x 
(c')  If  Aq  c c  a  and  s(Aq)  =  sfA^)  then  Z(AQ)  =  Z(A^) 
(d)  For  all  Aq  c a,  Z(AQ)  is  connected  (n-s(AQ))  - 
dimensional  submanifold 


and  when  this  is  done,  we  will  say  that  (G  (a  e  A) ,  H- (6  e  B) ,U) ,  or 

a  p 

more  briefly  (G  ,H0,U),  is  a  local  representation  (abbr.  l.r.)  for 

3  D  -  - 

C. 

Let  (G  ,  H  „ ,  U )  be  a  l.r.,  x  e C  n U.  Letting  A.  (x)  = 
a  0  + 

(a  e A  :  G^Cx)  =  0},  we  define 

T_  (x)  =  U  e  Rn  :  (x)  <0  Va£  A.  (x)  ,  ?‘VHQ(x)  =0  V  S  £  B} 

U  Ot  —  T  p 

L  (x)  =  (C  c  Rn  :  C’VG  (x)  =  0  Va  e  A  (x)’ ,  C’7H0(x)  =0  \/  3  e  B }  . 
c  a  +  p 


The  dimension  of  C  is  defined  to  be  dim  C  =  n  -  j  B  j .  It  does  not 
depend  on  x  or  on  the  particular  local  representation. 

For  x,yeC,  define  an  equivalence  relation  ~  by  specifying 


x~y  if,  and  only  if,  there  is  a  sequence  x  =  x^,  x^,*»*,x  =  V 

in  C  such  that  for  each  pair  (xy'x^  +  ]_)  (i  =  0 ,  •  •  •  , p-1 )  ,  there  is  a 
l.r.  (G  ,H,,,U)  such  that  Z  (A)  =>{x.,x.,,}.  The  equivalence  classes 

Up  X  1  +  X 

under  this  relation  are  the  faces  of  C.  The  proof  of  the  following 
may  be  found  in  [5]  : 
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(2.5)  THEOREM.  Let  C c  Rn  be  a  cyrtohedron 
x  e  C.  Then  x  lies  on  a  unique  face  F  of  C, 
CS  submanifold  of  Rn.  The  tangent  space 
There  is  a  l.r.  (G  ,H&,  U)  for  C  such  that 

-  a  ps  -  - 

such  l.r.,  Z (A)  =  F  n  U  and  dim  F  =  dim  Lc (x 


of  class  CS  (s  >  1) , 
and  F  is  a  connected 
to  F  a_t  x  i_s  Lc(x)  . 
x  e  Z (A) ,  and  for  any 
)  -  n  -  s  (A)  . 
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III.  First-order  conditions. 

In  this  section,  certain  first-order  conditions  (3.2)  are 
shown  to  be  generically  necessary  for  optimality.  This  will  be 
done  by  showing  that  a  constraint  qualification,  called  the  "in¬ 
dependence  criterion"  is  generically  satisfied  at  all  feasible 
points.  We  will  then  appeal  to  a  result  from  [5]  stating  that  in 
the  presence  of  this  qualification,  these  conditions  are  necessary 
for  optimality. 

It  is  assumed  here  that  f,  g,  and  h  are  of  class  on  Rn, 
and  C c  Rn  is  a  d-dimensional  cyrtohedron. 

If  x  is  feasible  for  (Q) ,  the  independence  criterion  (IC)  is 

!  j 

satisfied  for  (Q)  at  x  if  for  any  a  e  R  +  and  b  e  R  , 

(IC)  l1  a^Vgi(x)  +  Ij  b^Vhj  (x)  eLc(x)'*  implies  0  =  a  =  b. 

+ 

It  is  trivially  satisfied  if  I  =  J  =  j2f .  If  C  =  Rn,  IC  says  that 
the  gradients  of  the  active  constraints  at  x  are  linearly  indepen¬ 
dent.  More  generally,  if  F  is  the  face  of  C  that  contains  x,  IC 
says  that  the  gradients  of  g^]F,  i  c  I+  and  h ^ j F ,  j  e  J  at  x  form 
a  linearly  independent  set.  From  [5],  we  have: 

(3.1)  THEOREM.  I_f  x  i£  a  local  minimizer  for  (Q)  and  if  the  in¬ 
dependence  criterion  is  satisfied  at  x ,  then  there  exist  y  e  R*  and 
z  e  RJ  such  that 
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(3.2) 


(i)  -VxL(x,y ,z)  £ Nc (x) 

(ii)  >  0  implies  g^ (x)  =  0  V  i  £  I. 


Showing  that  the  first-order  conditions  3.2  are  necessary  for  op¬ 
timality  in  "most"  problems  reduces,  by  this  theorem,  to  showing 
that  IC  holds  for  "most"  problems. 

Let  E  :  Rn  -*■  R1  x  rJ  be  given  by  E(x)  =  (g(x),h(x)).  (If 

I  =  J  =  0,  then  R1  x  rJ  =  {0}  and  E(x)  =  0) ,  and  for  any  I'  c  I, 
define  Q  ( I '  )  =  {  (x,  0)  £  R1  x  rJ  :  x.  =  C  Vie  I'}. 

(3.3)  LEMMA.  Let  x  be  feasible  for  (Q) .  The  independence  cri¬ 
terion  for  Q )  is  satisf iea  at  x  if,  and  only  if, 


(3.4) 


R1  x  rJ  =  dE(x)  (Lc(x)  )  +  ft(I+(x)  )  . 


Proof:  dE(x)  is  the  (  1 1  j  +  jjj)  *ri  matrix  whose  rows  are  the  gra- 

ra> 


dients  of  f^,  i  e  I,  and  g y  j  e  J.  Let  c  =  represent  an  arbi¬ 

trary  (  | 1 1  +  | J | ) -dimensional  column  vector.  IC  holds  at  x  if, 
ana  only  if,  there  exists  no  c  =  (f\  1  0  with  a  e  R  +  such  that 


c'dE(x)z  =  0  for  all  z  eL^,(x),  an  assertion  that  is  easily  seen  to 
be  equivalent  to  3.4.  □ 


(3.5)  LEMMA .  Let  F  be  a  face  of  C .  I_f  E  j  F  :  F  -*■  R1  x  r1^  is  trans¬ 
verse  to  ft ( I ' )  for  every  I '  c  I ,  then  IC  is  satisfied  at  every  x  £  F 
which  is  feasible  for  (Q) . 
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Proof :  Immediate  from  the  definition  of  transversality  and  the 

preceding  lemma.  □ 

Now  suppose  that  f,  g,  and  h  are  of  class  on  Rn  xp,  and 
let  E  :  Rn  x  p  +  r1  x  be  given  by  E(x,p)  =  (g (x , p)  ,h (x,p) )  ,  We 
say  the  family  (Q^)  is  full  with  respect  to  constraints  if  the 
Jacobian  of  the  function  p'  ►*-  E(x,p')  has  rank  |lj  +  |j|  at 
every  (x,p)  e  C  x  p.  The  usual  right-hand-side  perturbations  fit 
this  requirement;  here,  P  =  R1  * RJ ,  and  for  any  p  =  (s,t)  e  P, 
g(x,p)  =  u(x)  -  s  and  h(x,p)  =  v(x)  -  t  for  some  C1  functions  u 
and  v. 

(3.6)  PROPOSITION.  Let  F  be  a  face  of  C.  Assume  that  C,  g,  and 

h  are  of  class  Cs,  with  s  >  max (0 , d- | J | )  (d  =  dim  C) ,  and  jt hat  (Q^) 

is  full  with  respect  to  constraints.  Then  there  is  a  subset  Pp  c  p 

such  that  P\P„  is  negligible,  and  for  'all  p  e  P^,  IC  holds  at  all 
- -  £  -  £  — 

x  e  F  which  are  feasible  for  (Qp) . 

Proof :  Since  (Qp)  is  full  with  respect  to  constraints,  the  Jacobian 

of  the  function  p'  • — ►  E(x,p')  has  rank  jlj  +  jjj  at  all  (x,p)  eFxp. 
In  particular,  E|(F*P)  ;  F  x  p  ->  r1  x  rJ  is  trivially  transverse  to 
any  submanifold  of  R1  x  rJ. 

For  each  I'  c i,  fi(i')  c  R1  x rJ  is  a  subspace  of  dimension 
J  X  1  -  1 1  *  |  <  1 1 1  .  Since  E|(Fxp)  is  transverse  to  ( I  * )  ,  and  since 

dim  F  +  dim  ft(l')  -  dim(RIxRJ)  <_  d  +  jlj  -  (|l|  +  |j|)  =  d  -  |j|  , 
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g 

and  since  F  and  E  are  of  class  C  with  s  >  max (0 , d- | J j ) ,  it  follows 
by  2.1  that  there  is  a  subset  P_  c p  with  negligible  complement 

r 

such  that  for  all  p  e  Pf,  the  function  E|  (F*{p})  :  F  -*■  R1  x  is 

transverse  to  Q(I’).  Clearly,  it  may  be  assumed  that  P?  has  this 
property  for  all  I*  c i.  By  Lemma  3.5,  for  all  peP_,  if  x e F  is 

r 

feasible  for  (Q  ) ,  then  IC  is  satisfied  at  x.  □ 

P 


(3.7)  LEMMA.  A  cyrtohedron  has  only  countably  many  faces. 


Proof:  Let  (G  ,  H,,  ,U)  be  a  l.r.  for  C.  I.t  is  enough  to  show  that 

-  a  p 

U  meets  only  countably  many  faces  of  C.  For  each  x  e  U  n  C,  define 
A^(x)  =  {  a  e  A  :  G  (x)  =  0}.  Fix  A 1  c  a,  and  let  T(A')  =  (xeUnC  : 
A+(x)  =  A'}.  Clearly  it  is  enough  to  show  that  T(A')  meets  only 
countably  many  faces  of  C.  For  each  yeT(A')  there  is  an  open 


ball  V  cU  about  y,  such  that  (G  (aeA'  )  ,  H„  ( BeB)  ,  V  )  is  a  l.r. 
y  a  p  y 

C  and  G  <  0  in  V  for  all  a c  A\A' .  By  definition  of  "face", 

ay 

set  Vy  nT(A')  is  contained  in  a  single  face  of  C.  Thus  each 
ycT(A')  has  a  neighborhood  in  T(A')  lying  in  a  single  face  of 
showing  T(A')  meets  only  countably  many  faces  of  C.  □ 


for 

the 

C, 


(3.8)  PROPOSITION.  Let  C,  g,  and  h  be  of  class  C  with 


s  >  max(0,d-jj|)  (d  =  dim  C) ,  and  let  (Q  )  be  full  with  respect 

hr 


to  constraints.  Then  there  is  a  subset  P^,  c  p  with  negligible  com¬ 
plement  such  that  if  p 
satisfies  IC  for  (Q  ) . 


plement  such  that  if  p  e  P^  and  x  is  feasible  for  (Q  )  ,  then  x 

^  p 


Proof:  For  each  face  F  of  C,  let  P^,  be  as  in  Proposition  3.6, 

-  r 

By  Lemma  3.7,  P„  =  n_  P_  has  the  desired  property,  □ 

C  i1  r 

Combining  this  with  Theorem  3.1,  we  obtain 


(3.9)  THEOREM.  Let  C,  g,  and  h  be  of  class  C  with  s  >  max(0,d-| J| ) 


(d  =  dim  C) ,  and  let  (Q  )  be  full  with  respect  to constraints . 

p  *  —  ~ 


Then  there  is  a  subset  Pc  c  p  with  negligible  complement  such  that 


if  p  e  P„  and  x  e  C  is  a  local  minimizer  for  (Q-)  ,  then  there  exists 

—  ^  c  -  -  p 


(y,z)  €  R1  x  RJ  such  that 

"r 


-VxL(x,y,Z,p)  e  Nc(x) 

Vi  e  I,  yi  >  0  implies  g^(x,p)  =  0 

The  assumption  that  (Q^)  is  full  with  respect  to  constraints 
can  be  weakened  somewhat : 


(3.10)  (i) 

(ii) 


(3.11)  COROLLARY.  If  there  is  a  closed  subset  P'  cp  of  measure 

zero  such  that  the  subfamily  {(Q  )  :  peP\P'}  is  full  with  respect 

P 

to  constxainls,  then  the  conclusion  of  3.9  holds . 


Proof:  Apply  Theorem  3.9  to  the  subfamily.  □ 
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IV.  Generic  Second-Order  Conditions 

2 

Henceforth,  f,  g,  h,  and  C  are  assumed  to  be  of  class  C  . 

Let  Rr  =  Rn  x  R1  x  r1^,  and  define  t  :  Rr  +  Rr  by 

t (w)  =  (VxL{w),  -VyL{w),  -V^L (w) )  (w  =  (x,y ,z) )  . 

~  I  J  ~  jr 

If  we  let  C  =  C  *  R+  *  R  ,  then  C  c  R  is  also  a  cyrtohedron  of  class 

c2. 

The  second-order  conditions  which  we  show  here  to  be  generi- 
cally  necessary  for  optimality  are  the  generalized  strong  second- 
order  conditions  discussed  previously  in  Spingarn  [5] .  A  point 
w  =  (x,y,z)  £  C  is  said  to  satisfy  these  conditions  for  the  problem 
(Q)  if 

(SSOC)  (i)  x  is  feasible  for  (Q) 

(ii)  -V  L(w)  £  relint  Nr(x) 

X  v» 

(iii)  Vi  e  I,  y^  >  0  if,  and  only  if,  g^(x)  =  0 

(iv)  The  independence  criterion  for  (Q)  holds 
at  x 

(v)  If  F  is  the  face  of  C  containing  x,  then 
(V^  (L  F)  (w)  )  U ,  O  >0  for  all  c,  e  Rn  satis¬ 
fying  0  f  &£Lc(x),  and  ^•Vgi(x)  = 

£*Vhj (x)  =  0  for  all  i  e  I+,  j  £  J. 
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For  a  more  detailed  discussion  of  these  conditions,  and  a  discussion 
of  their  relationship  to  the  classical  conditions,  we  refer  to  [5]. 

If  a  particular  representation  (Ga,H^,U)  for  C  near  x  is 
chosen,  these  conditions  could  be  rephrased  in  terms  of  the  func¬ 
tions  and  ,  without  ever  mentioning  the  set  C.  We  have 
avoided  doing  this  for  several  reasons.  Most  important,  the  roles 
played  by  the  two  types  of  constraints,  fixed  and  variable,  are 
not  the  same,  and  the  above  formulation  emphasizes  the  different 
ways  they  enter  into  the  conditions.  Also,  this  formulation  sug¬ 
gests  the  possibility  of  generalizing  the  conditions  to  a  broader 
class  of  sets  C.  Consider,  for  example,  the  set 

C  =  {x  =  (x1,x2,x3)  6  R3  :  ( x |  <  1  and  X1  +  x2  +  x3  >  |x j } . 

Because  no  representation  of  the  type  2.4  exists  for  C  near  x  =  0, 

C  is  not  a  cyrtohedpon.  But,  like  a  cyrtohedron,  C  can  be  parti¬ 
tioned  into  "faces"  (four  in  this  case)  that  are  submanifolds, 
and  N^, (x)  and  Lc(x)  have  obvious  meanings,  so  the  conditions  SSOC, 
as  stated,  are  still  meaningful.  In  fact,  C  has  all  the  properties 
that  are  required  for  our  proof  of  the  genericity  of  SSOC.  We  do 
not  know  if  there  is  a  "natural"  broader  class  to  which  our  re¬ 
sults  apply.  It  seems  that  the  conditions  should  be  generic  for 
sets  C  that  look  (in  some  sense)  locally  like  the  intersection  of 
a  cone  with  a  neighborhood  of  the  origin.  One  possible  class 
would  be  those  sets  C  such  that  each  x  e  C  has  a  neighborhood  U 
such  that  for  some  dif  f  eomorphism  <f» ,  and  some  closed  convex  cone 


1JU 


K,  <MX)  =  0  and  <J>  (CnU)  =  <^  ( U )  n  K.  For  this  class,  the  proof  of 
the  genericity  of  the  above  conditions  does  indeed  go  through,  but 
since  this  class  does  not  seem  to  include  cyrtohedra,  it  is  not  as 
broad  as  one  would  like. 

We  observed  in  [5]  that  for  any  w  =  (x,y,z)  e  C  with  x  feasi¬ 
ble  for  (Q) , 

(4.1)  w  satisfies  3.2  <=i>  -xweN^(w) 

C 

(4.2)  if  x  is  a  local  minimizer,  SSOC  holds  <=> 

(a)  -tw £  relint  N~(w)  and 

C 

(b)  w  is  a  nondegenerate  critical  point  for  L  on  G. 

Our  proof  of  the  generic  necessity  of  SSOC  will  proceed  as  follows. 
If  x  is  a  local  minimizer,  then  from  the  previous  section  we  have 
the  (generic)  existence  of  y  and  z  satisfying  the  first-order  con¬ 
ditions  3.2.  Let  w  =  (x,y,z).  From  4.1,  it  follows  that 
-tweN^(w).  By  4.3,  -TweN~(w)  implies  (generically )  that 

-tw e  relint  (w) ,  so  it  will  follow  that  4.2a  holds.  By  2.2  we 
C 

know  (generically)  that  all  critical  points  of  L  on  all  faces  of 
C  are  nondegenerate,  so  that  4.2b  also,  and  hence  SSOC  holds.  □ 

n  2 

(4.3)  PROPOSITION.  Let  CcR  be  a  cyrtohedron  of  class  C  ,  P 

n  n  1 

open,  and  t  :  R  *  P  R  a  C  function.  Suppose  that  for  each 
(x,p)  c  C  x  P,  the  map  p'  »-+■  t  (x,p ' )  has  Jacobian  of  rank  n  at  (x,p)  . 
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Then  there  is  a  subset  Pq  c  p  such  that  P\Pq  is  negligible  and  for 
all  p  e  Pq  and  all  x  e  C, 

(4.4)  (x,p)  eNc(x)  =>  x  ( x ,  p )  c relint  Nc(x). 

Proof:  Let  F  be  a  face  of  C.  For  every  x  e  F,  there  is  a  l.r. 

(G^/Hg/U)  for  which  x  e  Z (A)  =  F  n U.  For  each  such  l.r.,  we  will 
show  that  there  is  a  subset  P  c  p  with  P\P  negligible  such  that  if 
p  e  P  and  x  £  F  n  U,  then  4.4  holds.  F  may  be  covered  by  sets  U  cor¬ 
responding  to  countably  many  such  l.r.  Taking  the  intersection 
of  the  corresponding  sets  P  gives  a  set  Pp  such  that  4.4  is  satis¬ 
fied  for  all  p  e  PF  and  all  x  e  F.  By  Lemma  3.7,  the  set  PQ  =  fipPp 
(taking  the  intersection  over  all  faces  F  of  C)  will  have  the  de¬ 
sired  property. 

So  fix  a  face  F,  x  c  F,  and  (G  ,H„ ,U)  such  that  x  e  Z (A)  =  F  n  U. 

a  p  , 

For  any  t  e (x) \relint  Nc(x),  it  follows  from  the  definition  of 
Nc(x)  that  there  exists  AQ  c A  such  that  x  e  span  T(x,Aq)  c;  span  T(x,A) 
Now,  for  any  Aq  c a,  s(Aq)  =  rank  T(x,Aq)  for  all  x  e  U,  so  it  suf¬ 
fices  to  show  for  any  AQ  c A  with  s (Aq)  <  s (A) ,  that  except  for 
peP  belonging  to  a  negligible  subset,  x(x,p)  <1  span  T(x,Aq)  for 
all  xeFnU.  Henceforth,  we  fix  Aq  c  a  such  that  s(AQ)  <  s  (A)  . 

Let  N  =  (FnU)  *  Rn  and  S  =  {  (x,w)  e  N  :  we  span  T  (x,AQ)  }  . 

2  1 
Since  C  is  of  class  C  ,  S  is  a  (dim  F  +  s (Aq )) -dimensional  C  sub- 

2 

manifold,  and  N  is  a  (dim  F  +  n ) -dimensional  C  submanifold.  De¬ 
fine  <p  (x,p)  =  (x,  x(x,p)  )  ,  and  fix  XeFnU,  peP  such  that 
(j)(x,p)  e  S.  By  hypothesis,  range  d  <Mx,p)  =  (0}xRn.  Also, 
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N*(x,p)  “  FxXRn  and  S<p(x,p)  =  FxXK  for  some  subspace  KcRn. 

Hence  =  s(j)(XfP)  +  range  d  <|>(x,p),  showing  that  <f>(x,«)  : 

P  -*■  N  is  transverse  to  S,  and  hence  that  <j>  :  (FnU)  x  p  -*■  n  is  trans¬ 
verse  to  S.  By  2.5, 


dim (FnU)  +  dim  S  -  dim  N  =  dim  F  +  s(AQ)  -  n  <  dim  F  +  s (A)  -  n  =  0. 

So,  by  2.3,  there  is  a  subset  P(AQ)  cp  with  P\P(AQ)  negligible, 
such  that  for  all  peP(AQ)  and  all  x  e  F  n  U,  we  have  <p(x,p)  t  S ,  or 
equivalently,  t(x,p)  i  span  r(x,AQ).  □ 

The  family  (Q  )  will  be  called  full  provided  the  function 
p'  *— >  V  L(w,p')  e  Rr  has  Jacobian  of  rank  r  at  all  (  p)eCxp. 

This  notion  should  not  be. confused  with  "full  with  respect  to  con¬ 
straints",  which  is  a  weaker  property: 

(4.5)  PROPOSITION.  If  (Q  )  is  full,  then  it  is  full  with  respect 

IT 

to  constraints. 

Proof :  (0^)  is  full  with  respect  to  constraints  if,  and  only  if, 

the  Jacobian  of  p'  t->  V  L(w,p')  has  full  rank  III  +  |j|  at 

y ,  z  iiii 

every  (w,p)  eCxp.  When  it  does  not  have  full  rank,  then  neither 

does  the  Jacobian  of  p '  »— >  V  L(w,p')  =  V  L(w,p’),  so  (Q  )  is 

w  x  ,  y ,  z  p 


not  full.  □ 
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For  an  example,  suppose  that  u  :  Rn  R1 ,  v  :  Rn  ->  RJ,  and 

l  :  Rn  -*■  R  are  C2  functions.  Let  P  =  Rn  *  R1  *  RJ ,  and  for  any 

p  =  (q,s,t)  eP,  define  g(x,p)  =  u(x)  -  s,  h(x,p)  =  v(x)  -  t,  and 

f  (x,p)  =  l{x)  -  x*q.  Then  the  Jacobian  of  p  *->  VwL(w,p)  is  minus 

the  identity  matrix,  and  hence  of  rank  r. 

Previously,  we  saw  that  the  first-order  conditions  3.2  and 

3.10  are  necessary  for  optimality  for  most  pc  P  if  (Qp)  is  full 

with  respect  to  constraints  and  sufficient  differentiability  is 

assumed.  When  (Q  )  is  full  then,  for  most  p,  the  stronger  condi- 

P 

tions  SSOC  are  also  satisfied  provided  that  the  first-order  condi¬ 
tions  are: 


(4.6)  THEOREM.  Let  C  c Rn  be  a  cyrtohedron  of  class  C2,  P  open , 

2  n 

and  let  f ,  g ,  and  h  be  C  functions  on  R  x  P .  If_  ( )  is  full , 
then  there  is  a  subset  P^  c  p  such  that  P\PQ  is  negligible  and  for 
all  peP  :  if  x  e  C  is  a  local  minimizer  for  (Q_)  ,  and  if  yeR^ 


and  z  e  R  satisfy  3.10,  then  SSOC  holds , 


Proof:  Since  (Q_)  is  full,  the  hypotheses  for  Proposition  4.3  are 

satisfied  with  C  c  Rr  in  place  of  CcRn  and  -x  in  place  of  x.  So, 

there  is  a  subset  P'  cp  with  negligible  complement  such  that  for 

any  peP'  and  w e  C,  -x(w,p)  eN  (w)  implies  -x(w,p)  e  relint  N^(w). 

C  C 

Since  (Q  )  is  full,  the  Jacobian  of  p'  V  L(w,p')  e  Rr  is  of 
p  w 

rank  r  at  every  (w,p)  e  C  x P.  By  2.2,  for  every  face  G  of  C,  there 
is  a  set  P(G)  with  negligible  complement  in  P  such  that  L(*,p)  has 
only  nondegenerate  critical  points  on  G  for  all  p  e  P(G) .  Let 
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P"  =  nP(G),  taking  the  intersection  over  all  (countably  many  by 

Lemma  3.7)  faces  of  C,  and  define  Pg  =  P'  nP". 

Fix  p  t  Pjj  ,  x  a  local  minimizer  for  (Q_)  ,  and  let  w  -  (x,y,z) 

P 

satisfy  3.10.  Then  -T(w,p)  eN  (w)  by  4.1,  which  implies  that  w 

C 

is  a  critical  point  for  L(*,p)  on  the  face  G  of  C  containing  w  by 

[5,  Lemma  3.1c],  and  that  -t (w,p)  e  relint  (w)  since  peP'.  Since 

C 

peP",  w  is  a  nondegenerate  critical  point.  Thus  both  parts  of 
4.2  are  satisfied  and  SSOC  holds.  0 

(4.7)  THEOREM.  Let  C  <=  Rn  be  a  d-dimensional  cvrtohedron  of  class 
C  ,  P  open ,  f  of  class  C  and  g  and  h  of  class  c  on  Rnxp  with 
s  >  max{ 1 ,d- | J | } .  If  (Q  )  is  full,  there  is  a  subset  P  c p  with 

XT 

P\P g  negligible  such  that  for  all  p  e  PQ  :  if  x  e  C  is  a  local  mini- 

—  —  I  J 

mizer  for  (Q_)  there  exists  (y,z)  e  R+  x  R  satisfying  SSOC. 

P 

Proof :  Combine  Theorems  3.9  and  4.6  and  Proposition  4.5.  □ 

In  the  manner  of  Corollary  3.11,  it  follows  that  the  conclusion 
of  Theorems  4.7  is  still  valid  if  there  is  a  closed  measure  zero 
subset  P'  c p  such  that  the  subfamily  { (Q  )  :  peP\P'}  is  full. 
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I .  Introduction . 

In  nonlinear  programming  theory  there  is  a  large  gap  between 
the  weak  first-order  conditions  that  are  necessary  for  optimality 
and  the  much  stronger  second-order  conditions  that  have  been  found 
useful  in  the  design  and  analysis  of  algorithms.  It  is  common 
practice  to  assume  (without  giving  any  real  mathematical  justifi¬ 
cation)  that  very  strong  optimality  conditions  are  satisfied  at  a 
minimizer,  and  to  base  convergence  proofs,  and  thus  to  justify 
algorithms,  on  the  basis  of  such  assumptions.  Of  course,  for  any 
given  problem,  those  a  priori  assumptions  cannot  be  checked,  unless 
the  solution  is  already  known. 

In  this  paper,  we  discuss  a  "generic"  approach  to  optimality 
conditions  that  has  been  developed  in  Spingarn  and  Rockafellar  [10] 
and  Spingarn  [ 7 , 8 , 9 ]  .  Rather  than  talking  about  conditions  that  are 
necessary  for  optimality  in  specific  problems,  we  discuss  instead 
conditions  necessary  for  optimality  for  most  problems  in  a  f ami ly 
of  problems.  More  precisely,  for  a  family  (Q  (p))  of  nonlinear  pro¬ 
gramming  problems  indexed  by  a  parameter  p  e  P  c  Rn  we  study  conditions 
which,  unless  p  belongs  to  a  negligible  set,  hold  at  all  local 
minimizers  for  (Q(p))  where  by  negligible  we  mean  a  first  category 
set  of  measure  zero  in  P. 

This  approach  gives  a  rigorous  mathematical  underpinning  to 
the  a  priori  assumption  of  conditions  which  are  not  truly  necessary 
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for  optimality,  by  describing  the  exact  sense  and  the  circumstances 
in  which  these  conditions  can  be  expected  to  hold.  Another  attrac¬ 
tive  feature  of  the  theory  is  that  "constraint  qualifications”, 
which  are  normally  required  to  prove  the  necessity  of  Kuhn-Tucker 
type  first-order  conditions,  need  not  be  assumed  to  obtain  condition 
which  are  merely  gener ically  necessary. 

In  this  paper,  no  proofs  are  presented.  Instead,  we  refer 
the  reader  to  the  references  [7,8,10]. 

II.  A  simple  class  of  perturbations. 

Consider  the  basic  problem 

(Q)  min  f(x)  over  all  x e  Rn  such  that 

g(x)  <  0  and  h(x)  =  0  , 

where  the  functions  f  :  Rn  -*■  R ,  g  :  Rn  -+■  Rm,  and  h  :  Rn  -*■ 

are  continuously  differentiable. 

The  standard  first-order  conditions  for  local  optimality  of 

x  in  (Q)  are  that  x  should  be  feasible  and  there  should  exist 
m  K 

vectors  y  e  R+  and  z  e  R  such  that 

(KT)  V  f (x)  +  y’Vg(x)  +  z’Vh(x)  =  0 

and  for  all  i jfe I+ (x) ,  y^  =  0 

j 

where 
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I+(x)  =  {i  :  1  <  i <  m,  g. (x)  =  0}. 

These  conditions  are  not  actually  necessary  for  optimality.  They 
are  only  necessary  under  an  additional  assumption  called  a  "constraii 
qualification",  the  simplest  such  being 

(CQ)  (Vgi(x)  :  i  e  I+ ( x)  }  u  {  Vh_.  (x )  :  j  =  1 ,  *  -  *  ,  k } 

is  linearly  independent. 

When  the  functions  f,  g,  and  h  are  twice  differentiable,  a  vector 

x  is  said  to  satisfy  the  strong  second-order  conditions  for  local 

m  k 

optimality  in  (Q)  if  (CQ)  holds,  and  there  exists  y e  R+  and  z  e  R 
such  that  (KT)  holds  with 

y^  >  0  for  all  i  e  I+(x) ,  and 
every  nonzero  w  e  Rn  for  which  wVg.  (x)  =  0 
for  all  ie  I+  (x)  and  wVh^(x)  =  0  for  all  j  also 
satisfies  w*H(x,^,z)w  >  0, 

where  H(x,y,z)  is  the  Hessian  of  the  Lagrangian  function  in  ( Q ) : 

o  m  «  k  ^ 

H  (x ,  y ,  z )  =  V  f(x)  +  l  y  V  g.  (x)  +  l  z  V  h  (x)  . 

i=l  1  j=l  J  J 

These  conditions  are  known  to  guarantee  that  x  is  an  isolated 
locally  optimal  solution  to  (Q) .  They  also  have  other  important 
consequences,  for  example  with  respect  to  the  sensitivity  of  x 
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to  changes  in  a  parameter;  cf.  I-Iestenes  [  3  ]  ,  Fiacco  [1]  . 

The  strong  conditions  are  useful  for  proving  convergence  results; 

for  example,  cf.  Robinson  [5],  Rockafellar  [6  ],  Powell  [4], 

Fiacco  and  McCormick  [2]. 

Let  us  embed  (Q )  in  the  following  family  of  nonlinear  program¬ 
ming  problems 

(Q(v,u,t))  min  f (x)  -  x-v  over  all  x £  Rn 

such  that  g(x)  <  u,  h(x)  =  t. 

The  original  problem  (Q)  then  coincides  with  Q(0,0,0).  Any  partic¬ 
ular  problem  in  this  family  may  be  "bad"  in  the  sense  that  the 
strong  conditions  may  fail  to  hold  at  some  local  minimizer  for 
that  problem.  However,  the  set  of  bad  problems  is  small,  as  the 
following  shows  [10]: 

2  n — k 

THEOREM  1 .  Suppose  f  is  of  class  C  and  g  and  h  are  of  class  C 
Then  except  for  (v,u,t)  belonging  to  a  set  of  measure  zero  in 
Rn x  Rm x  Rk,  (Q ( v , u , t ) )  is  such  that  every  local  optimal  solution 
x  satisfies  the  strong  second-order  conditions. 

III.  General  perturbations. 

Next,  we  examine  what  happens  when  more  general  families  of 
problems  are  allowed.  The  families  we  wish  to  consider  are  of 


the  form 
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( Q ( P ) )  min  f(x,p)  over  all  x  satisfying 

g(x,p)  <  0,  h(x,p)  =  0 

with  p  ranging  over  some  open  subset  P  of  Euclidean  space. 

The  family  Q(v,u,t)  just  considered  clearly  is  a  special  case. 

Obviously,  some  additional  assumption  is  required  in  order 
to  guarantee  that  the  strong  conditions  fail  only  in  a  negligible 
subfamily.  After  all,  we  could  start  with  a  "bad"  problem  (Q) 
for  which  the  strong  conditions  fail  at  some  local  minimum,  and 
then,  by  introducing  trivial  perturbations  so  that  (Q(p))  = 

(Q)  for  all  p,  we  would  obtain  a  family  for  which  the  conditions 
fail  for  every  problem.  The  problem  here  is  that  the  indicated 
family  would  not  be  "rich”  enough;  it  would  not  contain  enough 
perturbations . 

The  following  definitions  specify  two  different  ways  a  family 
can  be  "rich".  If  g  and  h  are  of  class  C  ,  let  us  say  that  the 
family  (Q( p) )  is  full  with  respect  to  constraints  if  the  Jacobian 
of  the  function  p'  -*•  (g  (x,p)  ,h  (x,p)  )  e  Rm+Jc  has  full  rank  m+k  at 
every  (x,p)  e  Rn  *  P .  For  any  w  =  (x,y,z)  e  Rr  (r=n+m+k)  and  p  e  P,  let 

L(w,p)  =  f ( x , p )  +  y ' g (x , p)  +  z'h(x,p) 

2 

be  the  Lagrangian  for  (Q(p)}.  If  f,  g,  and  h  are  of  class  C  , 
the  family  (Q(p))  will  be  called  full  provided  the  function 
p'  -*•  VwL(w,p'  )  e  Rr  has  full  rank  r  at  all  (w,p)  e  Rr  x  P .  Every 
full  family  is  automatically  full  with  respect  to  constraints. 
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Those  two  properties  are  sufficient  to  guarantee  the  generic 
necessity  of  the  first-order  (KT)  and  strong  second-order  condi¬ 
tions,  respectively: 

THEOREM  2.  (a)  Let  g  and  h  be  of  class  CS  on  Rn  *  P  with 

s  >  max(0,n-k)  and  let  (Q(p))  be  full  with  respect  to  constraints. 
Then  there  is  a  subset  P '  c  p  with  negligible  complement  such  that 
if  p  e  P '  and  x  is  a  local  minimizer  for  ( Q ( p ) )  ,  then  there  exists 

—  —  m  1c 

(v , z )  t  R+  *  R  satisfying  (KT)  . 

2  s  n 

(b)  Let  f  be  of  class  C  and  g  and  h  of  class  C  jon  R  x  p 

with  s  >  max(l,n-k).  If  (Q(p))  is  full,  then  there  is  a  subset 

P'  c  p  with  negligible  complement  such  that  for  all  pe  P':  if  x 

—  —  —  j-fl 

is  a  local  minimizer  for  (Q  (p))  there  exists  (y,z)  e  R+  x  R  satisfying 
the  strong  second-order  conditions. 


To  see  how  Theorem  2  can  be  applied,  consider  again  the  family 
(Q(v,u,t)).  We  take  p  =  (v,u,t),  so  for  any  w  =  (x,y,z), 

L(w,p)  =  f(x)  -  x*v  +  v'(g(x)  -  u)  +  z'(h(x)  -  t). 

We  may  then  compute 
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and  hence  V  7  L(w,p)  =  -I,  where  I  is  the  (n+m+k) -dimensional 
p  w  r 

identity  matrix,  which  is  trivially  of  rank  n+m+k. 

The  full  rank  criteria  given  in  Theorem  2  are  sufficient, 
but  not  necessary  for  the  generic  necessity  of  the  strong  conditions 
However,  the  rank  criteria  can  be  weakened  (and  thus  the  theorem 
strengthened)  slightly.  To  illustrate,  consider  the  family 


4  2 

( Q ( P ) )  minimize  x  +  p  x  over  all  x e  R. 

4  2 

The  Lagrangian  for  (Q(p))is  L(x,p)  =  x  +  p  x  (since  there  are  no 

constraints)  so  V  V  L(x,p)  =  2p.  For  Theorem  2  to  apply,  it  would 

p  x 

have  to  be  true  that  2p  ^  0  for  all  p.  This  is  not  a  real  obstacle 
though,-  since  the  theorem  could  be  applied  to  the  subfamily 
{ Q  ( p )  :  p  0}.  The  same  reasoning  shows  in  general  that  the 

result  of  the  theorem  holds  whenever  the  set  of  p  values  for  which 
the  rank  condition  fails  is  contained  in  a  closed  measure  zero 
subset  of  P: 
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COROLL/iRY  1  .  If  there  is  a  closed  subset  P 1  c  p  of  measure  zero 
such  that  the  subfamily  {  (Q (p) )  :  p  e  P\P '  }  is  full  [with  respect 

to  constraints] ,  then  the  conclusion  of  Theorem  2a [resp. ,  of 
Theorem  2b]  holds . 

Another  minor  extension  is  suggested  by  the  family 

2 

(Q(p) )  minimize  px  +  (1  -  p)x  over  all  x  e  R 

where  p e  R.  In  this  case,  V  V  L(x,p)  =  2x  -  1.  For  Theorem  1 

p  x 

to  apply,  it  v.’ould  have  to  be  the  case  that  2x  -  1  ^  0  for  all 
xe  R.  Nonetheless,  it  is  possible  to  conclude  in  such  an  instance 
that  except  for  p  in  a  negligible  set,  the  strong  conditions  hold 
for  (Q(p))  at  all  local  minimizers  other  than  possibly  x  =  t  : 

COROLLARY  2 .  If  there  is  a  closed  set  K  e  Rn  such  that  the  rank 
condition  of  Theorem  2  holds  except  for  x  e  K ,  then  the  conclusion 
of  that  theorem  holds,  except  possibly  at  minimizers  which  are  in  K. 

IV .  Families  with  selective  perturbations. 

We  are  confronted  with  additional  questions  when  we  consider 
a  family  like  the  following  one: 

(S(v,u,t))  min  f  (x)  -  x*v  over  xe  Rn 

subject  to  g(x)  <  u,  h(x)  =  t,  and  x  >  0. 


145 


This  family  is  identical  to  Q(v,u,t),  with  the  important  exception 
that  here  there  is  an  additional  "fixed"  constraint  x>  0  that  is 
independent  of  the  parameters.  Neither  Theorem  1  nor  2  can  be 
applied  in  this  situation. 

Those  theorems  would  apply,  were  we  to  alter  the  family  by 
replacing  the  fixed  constraint  with  a  perturbed  constraint  x>  s. 

This  would  yield  a  family  Q(v,u,t,s)  for  which  the  strong  conditions 
are  necessary  except  for  (v,u,t,s)  in  a  set  of  measure  zero. 

However,  the  family  of  interest,  namely  (S(v,u,t))  =  (Q (v,u, t , 0) ) , 
would  be  a  measure  zero  subfamily  of  (Q ( v , u , t , s ) ) .  Thus,  although 
the  set  of  "bad"  problems  in  (Q(v,u,t,s))  is  negligible,  it  does 
not  follow  that  the  bad  problems  in  S(v,u,t)  are  negligible  with 
respect  to  S(v,u,t). 

Rather  than  concentrate  on  this  particular  family,  we  study 
the  generic  behavior  of  more  general  families  of  the  form 

(S (p) )  min  f(x,p)  over  all  x  e  Rn 

subject  to  g(x,p)  <0,  h(x,p)  =  0,  and  xeC, 

where  C  is  a  fixed  set.  For  the  family  S(v,u,t),  we  would  take 

C  =  R^ ,  while  the  situation  in  Theorems  1  and  2  requires  C  =  Rn . 

Concerning  the  family  (S(p)),  we  will  address  ourselves  here  to 

three  questions:  (1)  What  reasonable  assumptions  can  we  impose 

on  the  set  C  which  allow  us  to  develop  a  theory  of  generic  second- 

order  conditions  for  ( S ( p ) ) ?  Intuition  suggests  that  C  must  be 
2 

"piecewise  C  -smooth"  in  some  sense.  (2)  What  are  the  appropriate 
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generic  second-order  conditions?  It  turns  out  that  these  conditions 
actually  depend  on  the  set  C,  and  are  not  always  (but  sometimes  are) 
exactly  the  same  as  the  conditions  that  would  be  obtained  by 
replacing  the  constraint  x  e  C  with  inequality  or  equality  constraint 
and  then  writing  down  the  usual  strong  conditions  for  the  problem 
so  obtained.  (3)  Uhat  "rank  condition"  ensures  that  these  condi¬ 
tions  are  generic  for  (S(p))? 

We  begin  by  stating  our  assumptions  on  the  set  C.  These  have 
been  incorporated  into  the  definition  of  "cyrtohedron" .  The  name 
is  taken  from  the  Greek  "xuptoo"  (=  curved,  bent)  +  "sfpa"  (=  side) , 
and  is  motivated  by  the  fact  that  these  sets  look  like  polvhedra, 
except  that  the  "faces"  instead  of  being  polyhedral,  are  submani¬ 
folds  . 

Let  U <=  Rn  be  an  open  set,  G  ,  a  e  A  and  H  - ,  B  e  B,  finite  col¬ 
ei  p 

lections  of  differentiable  functions  on  U.  For  any  Aq c  A  and 
x  £  U ,  define 

T(x,An)  =  {VG  (X)  :  a  e  An }  u  { VH„ (x)  :  B  e  B} 

U  Cl  Up 

Z(AQ)  =  (yeU  :  0  =  Ga(y)  =  Hg(y)  £  Aq,  VB  £  B}  . 

n  s 

A  nonempty  connected  set  C  c  R  is  a  cyrtohedron  of  class  C  (s  >  1) 

—  s 

if  for  every  x e  C,  there  are  finitely  many  C  functions  Ga , 

and  ti„,  B  e  B,  defined  on  a  neighborhood  U  c  Rn  of  x  such  that 
p 

x  e  Z (A)  and 


a  e  A , 
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(a)  For  all  x  e  U,  x  c  C  if,  and  only  if, 

G  (x)  <  0  Va  e  A  and  H,  (x)  =  0  Vg  c  B. 
a  p 

(b)  If  ^a^VG^  (x)  +  VBb  VHg(x)  =  0  for  some  a  e  and 

g 

b  e  R  ,  then  a  =  0  and  b  =  0. 

(c)  For  each  Aq  c  A  there  is  an  integer  s ( A ^ )  such  that 

rank  f(x, Aq)  =  s(Aq)  for  X£  U' 


r] 

Examples  of  cyrtohedra .  ( a )  A  differentiable  submanifold  in  R 
is  a  cyrtohedron  for  which  the  set  A  may  always  be  taken  to  be 
empty . 

(b)  Cyrtohedra  for  which  the  set  A  may  always  be  taken  either 


empty  or  of  cardinality  one  are  submanifolds  with  boundary. 

(c)  A  polyhedral  convex  set  is  the  intersection  of  a  finite 
number  of  closed  half-spaces  in  Rn . 


(d)  Sets  that  can  be  expressed  as  C  =  (xe  R  :  (x)  ^  0, 

i  =  l,***,m,  and  h  .  (x)  =0,  j  =  l,*"*,p},  where  the  functions 
and  h_.  are  of  class  C  and  have  the  property  that  for  every 


X  £  C 


,  { Vg^ (x)  :  icI+(x)}u{Vhw(x)  :  j  =  p}  is  linearly 


independent,  where  I+(x)  =  {i  :  (x)  =  0}. 


For  an  example  of  a  simple  set  that  is  not  a  cyrtohedron, 
consider  the  set  C c  which  consists  of  all  x  =  (x^,X2<Xg)  such 
that  i x |  <1,  x^  +  x^  <  1,  and  -x^  +  <  1.  For  this  set,  there 

exist  no  functions  Ga ,  Hg  which  satisfy  the  above  requirements  in  a 
neighborhood  of  the  point  (0,0,1) . 

If  C  is  a  cyrtohedron,  then  U  may  always  be  chosen  so  that 


148 


(b1)  For  all  x  e  U ,  (b)  holds  with  x  in  place  of  x 
(c')  If  Aq  c  c  A  and  s(Aq)  =  s(A^)  then  Z(Aq)  =  Z (A., ) 

(d)  For  all  c  A,  Z(Aq)  connectec^  fn-stA^')  - 
dimensional  submanifold 

and  when  this  is  done,  we  will  say  that  (G  (acA),H„(Se  B),U),  or 

a  p 

more  briefly  (G  is  a  local  representation  (abbr.  l.r.) 

Ci  P - 

for  C . 

Let  (G  ,H0,U)  be  a  l.r.  ,  x  e  C  n  U.  Letting  A  (x)  = 
a  b  + 

(a  f  A  :  G,  (x)  =  0} ,  we  define 

L  (x)  =  {;  £  Rn  :  C-vG  (x)  =  0  Va  e  A.  (x)  ,  ?*7H.(x)  =  0  \/B  e  B} . 

L  a  +  p 

A  (x)  R 

N„(x)  =  {  y  a  FG  (x)  +  y  bc7K_  (x)  :  a  e  and  be  R  } 

v—  .  *  /  \  Cl  Ct  o  r-s  P  P 

a  £  { x )  £  e  B 

N^(x)  is  the  normal  cone  to  C  at  x,  and  L^lx)  is  the  linear  approxi¬ 
mation  to  C  at  x;  the  latter  is  the  tangent  space  at  x  to  the  "face" 

(definition  below)  of  C  containing  x.  The  dimension  of  C  is  defined 
to  be  dim  C  =  n  -  |b| .  It  does  not  depend  on  x,  and  none  of  these 
definitions  depend  on  the  particular  local  representation  chosen. 

For  x,ye  C,  define  an  equivalence  relation  ~  by  specifying 
x~y  if,  and  only  if,  there  is  a  sequence  x  =  Xq  ,  x^,***,x  =  y 

in  C  such  that  for  each  pair  (x^,x^  +  ^)  ( i= 0 ,  • • • , p-1 )  ,  there  is  a 

l.r.  (G  ,  H  .  ,  U )  such  that  Z  (A)  =■  {x  .  ,x  .  ,  .  }  .  The  equivalence  classes 
a  p  li+l 

under  this  relation  are  the  faces  of  C. 
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A  few  examples  help  to  clarify  the  latter  definition: 

(a)  The  faces  of  a  polyhedral  convex  set  are  the  relative  interiors 
of  its  "faces"  in  the  usual  sense  (that  is,  subsets  which  are 

the  intersection  with  some  supporting  hyperplane) . 

(b)  A  submanifold  Cc  r“  has  only  one  face. 

(c)  If  C  is  the  hemisphere  C  =  {x  =  (x^,***,xn)  e  Rn  :  jxj  <  1  and 

x^  >  0],  then  C  has  four  faces,  corresponding  to  the  choices  of 
equality  or  strict  inequality  in  the  definition  of  C: 


F1  =  <*  : 

|  x| 

<  1 

and 

x  > 
n 

0} 

F2  =  {X  : 

lx! 

-  1 

and 

x  > 
n 

0} 

F3  =  {x  : 

|xf 

V  1 

and 

x  = 
n 

0} 

*TJ 

It 

X 

|x| 

=  1 

and 

x  = 
n 

0} 

To  state  the  optimality  conditions,  we  need  some  more  defi¬ 
nitions.  Consider  a  specific  problem 

(S)  min  f (x)  over  all  x  e  Rn  such  that 

g  (x)  <  0,  'n(x)  =  0,  and  x  e  C . 

If  x  is  feasible  for  (S) ,  the  independence  criterion  (IC)  is 

m  }c 

satisfied  for  (S)  at  x  if  for  any  a  e  R  and  b  e  R  with  a^  =  0 
+  ' 


for  all  i (  I 
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m  k 

(IC)  /  a.Vg.(x)  +  I  b.Vh.(x)  t  L^Cx)1  iir.pl ies  0  =  a  =  b. 

1=1  11  j=i  3  D  C 

It  is  trivially  satisfied  if  m  =  k  =  0 .  If  C  =  Rn ,  IC  says  that 
the  gradients  of  the  active  constraints  at  x  are  linearly  indepen¬ 
dent.  More  generally,  IC  says  that  the  projections  of  the  gradients 
of  g^ ,  i  e  I+  and  rn  at  x  onto  L^Cx)  form  a  linearly  indepen¬ 
dent  set. 

n  s 

A  set  Me  r  is  a  k-dimensional  C  submanifold  (s  s  1)  if  for 

k  s 

each  xe  M  there  is  an  open  set  Uc  R  and  a  C  di f f eomorphism  v> 
mapping  U  onto  a  neighborhood  of  x  in  M.  For  any  x  =  $(q)  €  M , 

=  range  dv(q)  is  the  tangent  space  to  M  at  x.  If  f  :  Rn  ■>  R, 

then  ,:f[M"  denotes  the  restriction  of  f  to  M.  For  any  xe  Rn , 

"Vf(x)"  denotes  the  ordinary  gradient  of  f  at  x,  while  "V(fJM)(x)" 

denotes  the  gradient  of  f | M  at  x,  the  latter  being  a  linear  func¬ 
tion  on  M  .  If  V(f|M)(x)  =  0  (i.e.,  if  Vf(x)  is  perpendicular  to 

A 

M„) ,  then  x  is  a  critical  point  for  f  on  M ,  and  in  this  case  the 
Hessian  for  fjM  at  x  =  <5>(q)  is  the  bilinear  function  on  M„  defined 
by 

(V2  (f  |  M)  (x)  )  (u,v)  =  (V2  (fo<J>)  (q)  )  (u,v) 

-  -  2 

where  u  =  d<J(x)u,  v  =  di'(x)v,  and  V  (f°<!>)  (q)  is  the  ordinary 

2 

Hessian  of  f°0.  If  V  ( f  °  <J> )  (qj  is  nonsingular,  then  x  is  a  nonde¬ 
generate  critical  point  for  f  on  M. 

Suppose  henceforth  that  f ,  g ,  and  h  are  of  class  C2  on  Rn , 
and  that  C c  Rn  is  a  cyrtohedron  of  class  C2 .  We  extend  the  definitio. 
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of  the  strong  second  order  conditions  to  the  problem  (S)  by 
declaring  a  point  w  =  (x,y,z)  with  x  £  C,  y  £  R™,  and  z  e  RK  to 
satisfy  the  conditions  whenever 


(SSOC) 


(i) 
(ii) 
(  iii) 

(iv) 

(v) 


x  is  feasible  for  (S) 

-V^L(w)  £  relint  N  (x) 

X  (_ 

i£l,  y^  >  0  if,  and  only  if,  g^  ( x)  =  0 
The  independence  criterion  for  (S)  holds  at  x 
If  F  is  the  face  of  C  containing  x,  then 
(V  (L j  F)  (w) )  (£,£)  >  0  for  all  £  £  Rn  satis- 
fying  0  /  ;eLc(x),  and  c*7g^(x)  = 

<;*7hj(x)  =  0  for  all  l  £  I  +  ,  and  all  j. 


As  before,  we  say  the  family  (S(p))  is  full  provided  the  map 
p  '  -*■  V^L  (w  ,  p '  )  £  Rr  has  full  rank  r  at  all  (w ,  p)  £  Rr  *  P .  We  now 
have  covered  all  the  preliminaries  needed  to  state  the  final  result. 


THEOREM  3.  Let  C  c  Rn  be  a  d-cimensional  cyrtohedron  of  class 

CS ,  P  open ,  f  of  class  and  g  and  h  of  class  CS  on  Rn  *  P  with 

s  -■  max  { 1 ,  c  -  k }  .  I_f  ( S  ( p )  }  is  full,  there  is  a  subset  P  ^  c  p  with 

p \ P 0  negligible  such  that  for  all  p  e  P ^  :  if_  x  £  C  is  a  local  mini- 

—  _  _  m  k 

miner  for  (S ( p) )  there  exists  (y , z )  e  R+  *  R  satisfying  SSOC . 


Of  course,  this  result  can  be  slightly  improved  in  the  manner  of 
Corollaries  1  and  2. 
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V.  Comparison  with  the  classical  conditions. 

For  problems  of  the  form  (Q)  we  have  seen  that  under  mild 
assumptions,  the  classical  strong  conditions 

(SC)  i)  x  is  feasible  for  (Q)  . 

ii)  Vf(x)  +  ^v^tgi(x)  +  yz^Vh_.(x)  =  0. 

iii)  Strict  complementary  slackness:  y^ >  0  <=>  g^(x)  =  0. 

iv)  The  gradients  of  the  active  constraints,  i.e. 

{Vg_^  (x)  :  ieI  +  }u{Vhn(x)  :  j  =  l ,  •  •  •  ,k}  form  a 

linearly  independent  set. 

v)  For  any  (  «  Rn  satisfying  c,  ^  0, 

C*Vgi(x)  =  0  Vi  e  I+/  and  C*71m(x)  =  0,  j=l,***,k, 

we  have  C  ’  ( V2f  (x)  +  [yi^2gi  (x )  +  £5^2h  (x)  ]  C  >  0 

are  generically  necessary  for  optimality  in  families  of  problems 
containing  (Q)  (cf.  Theorems  1  and  2) ,  and  that  for  problems  of 
the  form  (S)  (i.e.,  families  with  fixed  cyrtohedron  constraints), 

the  more  general  conditions  SSOC  are  generically  necessary  for 
opt imality . 

Locally,  the  fixed  set  C  can  be  represented  by  inequality  and 

equality  constraints;  if  (Gq,H..  ,U)  is  a  local  representation  for  C, 

then  CnU  =  {x^U  :  G  (x)  <  0,  a  e  A,  H„(x)  =  0,  S  £  B}  .  So,  at  least 

a  p 

locally,  (S)  is  equivalent  to  a  problem  (Q')  of  the  type  (Q)  (i.e., 

without  "fixed"  constraints)  : 
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( Q ' )  min  f (x)  subject  to  g^  (x)  s  0,  i  =  l,-**,™, 

h_.  (x)  =  0,  j  =  1 , • • • , k ,  (x)  <  0  ,  a  £  A, 

Hc (x)  =0,  B €  B. 

p 

It  is  natural  to  ask  what  the  relationship  is  between  the  conditions 
SSOC  for  (S)  and  SC  for  (O'). 

In  most  cases,  the  two  sets  of  conditions  are  essentially 

—  —  —  —  —  n  m  A  k  I 
equivalent  in  the  following  sense.  If  (x,y,a,z,b)  eR  *R+xR+*r  *  R 

satisfies  SC  for  (Q1),  then  (x,y,z)  satisfies  SSOC  for  (S) .  If 

(x ,  y ,  z )  e  Rn  x  r™  x  r^  satisfies  SSOC  for  (S)  ,  then  it  is  possible  to 

-A  -  B  _____ 

find  a e  R+  and  be  R  such  that  (x,y,a,z,b)  satisfies  SCi ,  ii,  i i 5  . 

•=nd  for  any  such  a  and  b,  SCv  will  automatically  hold  for  (Q  * )  . 

However ,  SCiv  may  fail.  For  example,  if  C  is  a  four-sided  pyramid 
3 

in  R  with  apex  x,  SCiv  can  never  be  satisfied  for  ( Q '  )  because  no 

3 

set  of  four  vectors  in  R  can  be  linearly  independent.  However, 

SSOCiv  can  (and  usually  will)  be  satisfies  at  x.  In  fact,  (x,y,z) 
will  satisfy  SSOCiv  if  and  only  if  the  projections  onto  L^(x) 
of  the  gradients  of  the  (nonfixed)  constraints  active  at  x  are 
linearly  independent.  But  L^(x)  =  {0}  in  this  case,  so  SSOCiv 
merely  says  that  there  are  no  active  constraints  at  x.  Of  course, 
one  would  expect  the  generic  conditions  to  assert  this.  If  k>  0, 
one  would  expect  the  apex  of  the  pyramid  to  be  a  minimizer  with 
probability  zero.  If  k  =  0,  it  is  not  unusual  that  the  apex  should 
be  a  minimizer,  but  one  would  expect  one  or  more  of  the  inequality 
constraints  to  be  active  there  only  with  probability  zero. 
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In  the  most  common  cases,  such  as  C  =  ,  the  set  C  will  be 

expressible  as  the  set  of  points  which  satisfy  a  finite  number  of 
equality  and  inequality  constraints  with  linearly  independent 
gradients  (cf.  section  III,  example  (d)  under  "examples  of  cyrto- 
hedra").  Then,  the  two  sets  of  conditions  are  essentially  the  same. 
The  main  difference  is  that  in  the  SSOC  formulation,  no  multipliers 
are  associated  with  the  constraints  defining  the  cyrtohedron. 

We  also  remark  that  the  SSOC  formulation  suggests  what  the 
generic  conditions  should  look  like  if  we  ger.eralize  them  to  a  wider 
class  of  fixed  sets  C.  Consider,  for  example,  the  set 

3 

C={x=(x1,x2,x3)eR  :  [  x  j  <  1  and  +  x2  +  *3  >  jxj}. 

Because  no  local  representation  exists  for  C  near  x  =  0,  C  is  not 
a  cyrtohedron.  But,  like  a  cyrtohedron,  C  can  be  partitioned  into 
"faces"  (four  in  this  case)  that  are  submanifolds,  and  N^.(x)  and 
L  (x)  have  obvious  meanings,  so  the  conditions  SSOC,  as  stated  above,! 
are  still  meaningful.  In  fact,  C  has  all  the  properties  that  are  j 
required  for  our  proof  of  the  genericity  of  SSOC.  For  such  a  set  C, 
it  would  be  impossible  to  reformulate  the  problem  (S)  as  a  problem 
in  the  form  of  (Q ' ) ,  so  the  old  conditions  SC  have  no  bearing  here, 
although  the  new  conditions  SSOC  would  apply  and  can  be  shown  to 
be  generically  necessary  for  optimality.  We  do  not  know  if  there 
is  a  "natural"  broader  class  to  which  our  results  apply.  The  above 
example  suggests  conditions  should  be  generic  for  sets  C  that  look 
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(in  some  sense)  locally  like  the  intersection  of  a  cone  with  a 
neighborhood  of  the  origin.  One  possible  class  would  be  those 
sets  C  such  that  each  x  e  C  has  a  neighborhood  U  such  that  for  some 
dif f eomorphism  $ ,  and  some  closed  convex  cone  K,  b  (x)  =  0  and 
b(CnU)  =  b  (d)  n  K.  For  this  class,  the  proof  of  the  genericity  of 
the  above  conditions  does  indeed  go  through  However,  this  is  not 
as  broad  a  class  as  we  would  like;  it  does  not  seem  even  to  include 
the  class  of  cyrtohedra. 
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The  class  of  "lower  -C1"  functions,  that  is  functions  which 
arise  by  taking  the  maximum  of  a  compact  family  of  C1  functions, 
is  characterized  in  terms  of  properties  of  the  Clarke  subdif¬ 
ferential.  A  locally  Lipschitz  function  is  shown  to  be  lower-C'*' 
if,  and  only  if,  its  subdifferential  is  "strictly  submonotone". 
Other  properties  of  functions  with  "submonotone"  subdifferentials 
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0 .  Introduction 

One  of  the  nice  features  of  convex  optimization  is  the  link 
with  "monotone'1  mappings.  Due  to  this,  convex  problems  can  be 
rephrased  <.s  "variational  problems",  often  resulting  in  consid¬ 
erable  simplif ication .  This  can  be  useful  for  theoretical 
reasons,  by  emphasizing  when  the  central  justification  for  a 
proof  or  procedure  is  the  monotonicity  of  the  subdifferential. 
For  example,  Rockafellar  [7,83  has  exploited  the  link  between 
monotone  mappings  and  saddle  functions  to  unify  and  simplify 
the  existing  theory  of  multiplier  methods  in  convex  programming. 

It  is  the  aim  of  this  paper  to  show  that  a  concept  closely 
related  to  monotonicity,  e.g.  "submonotonicity",  also  plays  a 
natural  role  in  the  analysis  of  nondif f erentiable ,  nonconvex 
problems.  We  will  do  this  by  demonstrating  how  properties  of 
nondif ferentiable  functions  can  be  related  to  monotone-type 
properties  of  their  Clarke  subdifferentials. 

Our  most  important  result  appears  in  section  IV,  where  a 
complete  characterization  is  obtained,  in  terms  of  properties 
of  the  Clarke  subdifferential,  for  the  class  of  "lower-C'*'" 
functions,  that  is  functions  that  arise  by  taking  the  maximum 
of  a  compact  family  of  functions.  It  is  shown  that  these 
functions  are  precisely  those  locally  Lipschitz  functions  whose 
Clarke  subdifferentials  are  "strictly  submonotone " . 

In  section  III,  some  implications  of  the  submonr tonicity 
property  are  developed,  and  several  equivalent  characterizations 
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are  given.  This  concept  is  then  contrasted  with  properties  that 
have  been  discussed  by  other  authors.  Among  these  are  regularity 
in  the  sense  of  Clarke  [2],  quasi-differentiability  in  the  sense 
of  Pshenichnyi  [5],  lower  semi-differentiability  in  the  sense  of 
Rockafellar  [9],  and  semismoothness  in  the  sense  of  Mifflin  [4]. 

We  wish  to  thank  Professor  Rockafellar  for  sharing  many 
valuable  insights  with  us. 
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I.  Notation 


R  denotes  Euclidean  space  with  the  usual  inner  product 

r> 

x * y  =  <x,y>  =  2xiyi.  The  closed  unit  ball  in  R  is  denoted 

by  B  =  {xe  Rn  :  |x|  ^  1}. 

n  * 

If  Kc  r  is  a  compact  convex  set,  then  is  the  support 

X\  - 

* 

function  of  K,  defined  by  ¥„  (u)  =  sup{<u,x>  :  xe  K}  .  For  any 
- 1 —  K 

ue  Rn,  we  let  Ku  =  {xe  K  :  <u,x>  =  V  (u)  }  . 

The  notation  T  :  Rn  X  Rn  indicates  that  T  is  a  set¬ 
valued  mapping.  T  is  closed  provided  the  set  {(x,y)  :  ye  T(x)} 

is  closed.  T  is  locally  bounded  if  for  every  x e  Rn  there  is 
e>0  and  R  >  0  such  that  yeT(x),  |x-x|  <  £  implies  | y |  <  R. 


We  will  say  the  sequence  (xfi)  converges  to  x  in  the 

direction  ue  Rn,  written  x  — ►  x,  provided  either  x  +  x 
-  n  u  r  n 


and  u  =  0,  or  u  r  0, 


Xn~x 

x  -x 
n 


,  and  x  /  x  for  all  n. 
n 


If  f  :  R  ->  R,  the  directional  derivative  of  f  at  x 
(when  it  exists)  is 


f  '  (x;u)  =  lim  -  -11*1 

t+  0 
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II .  Submono tonicity 

In  this  section,  T  :  Rn  i  Rn  denotes  a  convex-valued  closed 
multifunction .  T  will  be  called  submonotone  at  x  e  Rn  provided 

lim  inf 
x'-»-x,  x'/x 
yeT(x) ,  y ' eT (x* ) 

(T  is  trivially  submonotone  at  x  if  T(x)  =  £!)  .  T  is  directionally 

upper  semicontinuous  (d.u.s.c.)  at  x  provided  that  for  all  u  e  Rn , 

whenever  x,  — *■  x  and  y,  e  T(x  )  for  all  k,  then  for  every  e  >  0 
K  U  K  K 

there  exists  such  that 

T(xk)  c  T(x)u  +  cB  Vk  >  kQ  . 

For  u  =  0,  this  is  automatically  satisfied  since  T  is  assumed 
to  be  closed.  If  T  is  locally  bounded  near  x  then  T  is  d.u.s.c. 
at  x  if,  and  only  if,  for  all  u  ^  0,  whenever  x,  — ►  x  and 
T(xk)  3  y^  y ,  then  ye  T{x)^.  If  T  is  submonotone  [respectively, 
d.u.s.c.]  at  all  xe  Rn,  then  T  is  submonotone  [resp. ,  d.u.s.c.] . 

(2.1)  THEOREM.  Let  T  :  Rn  Z  Rn  be  locally  bounded  near  x  (as 
is  the  case  if  T  =  3 f  with  f  locally  Lipschitz )  .  Then  T  is 
d.u.s.c.  at  x  if,  and  only  if,  T  is  submonotone  at  x. 


Z-Ll  x'  ~  x>  >  o 
lx'  -x| 
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Proof .  If  T  is  not  submonotone  at  x,  there  is  e  >  0  and  there 

are  sequences  x  -*•  x,  x  ^  x,  y  e  T(x  )  ,  v  '  e  T(x)  ,  such  that 
^  n  n  •*  n  n  *  n 

<x  -x,  y  -y  ’> 

— v - 2-: — <  -e  <  0,  Vn.  We  may  clearlv  assume  x  — *•  x  for 

I  x  -  x  |  2  2  n  u 

n 

some  u  t*  0,  and  since  T  is  closed  and  locally  bounded,  that 

★ 

y  -**  yt  Tlx)  and  y  '  -*■  y '  €  T  (x)  .  Then  V  .  .  (u)  >  <u ,y ' > -  e  >  <u,y>, 
n  n  JL  V  x ) 

so  T  is  not  d.u.s.c. 

Suppose  that  T  is  submonotone  at  x.  Let  xn  — ^  x,  u  0, 

y  e  T(x  )  ,  y  -*•  y.  Since  T  is  closed  and  locally  bounded, 

■*  n  n  n 

ye  T  (x)  and  we  will  be  done  if  we  can  show  y  e  T  (x)  If  zeT(x), 


<y  -  z ,  x  -  x> 

(y  -  z)  *u  =  lim  - — — -  >  0 

2  x  -  x 

n 


since  T  is  submonotone  at  x.  Since  this  holds  for  all  ze  T(x), 

* 

y<u  >  ^<p(x)^u^'  showing  that  T  is  d.u.s.c.  at  x.  I 

Of  course  if  f  :  Rn  R  is  convex,  3f  is  monotone,  and  hence 
submonotone.  The  fact  that  3£  is  directionally  upper  semicontinuous 
is  proved  by  Rockafellar  [6,  Theorem  24.6]. 

The  multifunction  T  :  Rn  X  Rn  will  be  called  strict ly  submon¬ 
otone  at  x  provided 

lim  inf 

V*2 

x^x,  i=l,2 
yieT^xi^ '  i=i/2 


Strict  submonotonicity  clearly  implies  submonotonicity. 
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Hcxt,  v,'e  state  a  characterisation  of  strict  submonotonicity 
similar  to  the  one  provided  in  Theorem  2.1  for  submonotonicity. 
The  proof  is  similar.-  so  it  has  been  omitted. 


(2.2)  THEOREM.  Let  T  :  Rn  2  Rn  be  locally  bounded  near'x. 
Then  T  is  strictly  submonotone  at  x  if,  and  only  if,  whenever 


*  *A- 


y  6  T(x  )  ,  y '  e  T(x' )  , 
n  n  n  n 


x  -  x*  — *■  0,  one  also  has  v*v'  <  v*v. 

n  n  v  -  1  1 


y  /  yJ 
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III.  Lipschitzian  functions 

Next,  we  turn  our  attention  to  a  particular  class  of  multi¬ 
functions,  namely  those  that  are  the  Clarke  generalized  gradient 
mapping  [1]  for  a  locally  Lipschitz  function  f  :  Rn  -*■  R .  Thus, 
if  T  =  3f,  we  ask  what  the  submonotonicity  of  3f  implies  about  f. 

If  f  is  locally  Lipschitz,  the  Clarke  derivative  of  f  is 
the  function 


f° ( x , u) 


lim  sup 
t  1  0 
h+0 


f (x+h+tu)  -  f (x+h) 
t 


f ° ( x , • )  is  a  continuous  sublinear  function  which  is  the  support 
function  of  the  compact  convex  set  3f(x)  called  the  Clarke  general¬ 
ized  gradient  of  f  at  x.  For  every  u,v  e  Rn,  f 0 (x; • ) ,  being  a 
finite  convex  functior ,  possesses  a  finite  directional  derivative 
at  u  in  the  direction  v  which  we  denote  by  f°(x;u;v).  Alterna¬ 
tively,  we  could  define  f° (x;u; •)  to  be  the  support  function  of 
3f(x)  .  Clearly  f°(x;0;-)  =  f°(x;-).  Let  us  also  define 


f  ( x ;  u ;  v )  = 


lim  sup 

h  — *•  0 

u 

t/| hj 10 
f°  ( X ;  v ) 


f (x+n+tv)  -  f(x+h) 


if  u  /  0 


if  u  =  0 


Clearly  f  (x;u;v)  <  f°(x,-v).  Also,  f  (x;u;*)  is  sublinear,  so 
f  ( x ;  u ;  • )  is  the  support  function  of  some  subset  of  3f  (x)  .  As 
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we  shall  see,  the  case  where  that  subset  is  3£(x)  corresponds 
to  the  case  where  3f  is  submonotone  or,  equivalently,  d.u.s.c. 
To  see  that  f  (x;u;*)  is  sublinear,  note  that 


f  (x;u;v],+v2) 


=  lim  sup 
£  lira  sup 
+  lim  sup 
=  ( X ; u ; v 


f (x+h+t ( vy+V2 ) )  -  f(x+h) 
t 

f  (x+  (h+tv]_)  +tv2)  -  f  (x+ (h+tV]_)  ) 

t 

f (x+ (h+tv^) )  -  f(x+h) 
t 

2)  +  f**(x;u;v.)  . 


(3.1)  THEOREM.  Let  f  :  Rn  ■+•  R  be  locally  Lipschitz.  3f  is 
d.u.s.c.  at  x  if,  and  only  if,  f°(x;u;v)  =  f^(x;u;v)  for  all  u,veRn 


Proof :  (<=)  Let  u  ^  0  (if  u  =  0,  the  assertion  is  trivial) , 

xk  — x,  9f(x^)  3  y^  -+  y.  To  show  3f  is  d.u.s.c.,  it  must  be 
demonstrated  that  ye  3f(x)  .  Fix  an  arbitrary  ve  Rn.  Then 


v*yk  <  f ° (xk; v) 


lim  sup 
h+Q 
t  +  0 


f (xk+h+tv)  -  f (xk+h) 
t 


so  hk,  tk>0  can  be  found  with 


v*y,  -  r-  <, 
J  k  k 


i  .  f(Vhk+tkv)  -  f(Vhk> 
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Hence , 


v*y  =  lim  v*y 


s  lim  sud 
k 


f  ( V  W1 


f <  vv 


^  f  ( x ;  u ;  v )  , 


where  the  last  inequality  follows  from  the  fact  that  x  -  x  t  h  — *■ 

k  k  u 

anc*  t^/ |  x^-x+h^  |  i  0 .  But  f  (x;u;v)  =  f°  (x;u;v)  by  assumption,  so 
v*y  s  f  °  (x ;  u ;  v)  =  (v)  for  all  v,  which  implies  that  ye  3f(x)u 


(  =  >) 
Pick 


Fix  u  0 ,  v  e  Rn . 
sequences  h^  — ^  0, 

f""(x;u;v) 


First  we  show  that  f 0 (x;u;v) 
tn/ |hn|  I  0  such  that 

f(x+h  +t  v)  -  f (x+h  ) 

=  lim  - 2 — 2 - 2_ 


>  f  ( x ;  u ;  v )  . 


By  the  mean-value  property  [Lebourg,  3],  there  is,  for  each  n, 
yn e  9f (x+hn+cntnv)  with  0  <  cn  <  1  such  that 

f(x+h  +t  v)  -  f (x+h  ) 

•  y  —  ^  ^  _  n 

2n  t 

n 

Without  loss  of  generality,  we  can  assume  that  y  -*•  y  for  some 

ye  3f(x).  Since  3f  is  assumed  to  be  d.u. s.c.  at  x,  we  have 

ye  3f(x)u>  Hence  f  (x;u;v)  =  lim  v*y  =  vy  5  y*  (v)  = 

3  (x)^ 

f°(x;u;v) ,  as  desired. 
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* 

To  prove  the  opposite  inequality,  fix  u  /  0,  v  e  Rn ,  we  3f(x)u, 

and  we  will  show  w*v  <  f  (x;u;v).  From  this,  the  desired  inequality 

follows  by  taking  the  supremum  in  w. 

By  d.u.s.c.,  we  may  find  6n  >  0  {n=l,2,***)  such  that 

0  <  6  <  <$  implies 
n 


3f (x+5 (u+— v) )  c  3f(x)  ,  +'  B  . 

n  ,  1  \  Z 

u+— v  ' n 
n  i 


! 


Clearly  we  may  assume  6n  -*■  0.  Let  xn  =  x 


y  e  3f (x  ) .  Then  x 
Jn  n  n  u 


*  x  and  y  e  3f  (x 
Jn 


+  6  ( uH — -v )  and  choose 
n  n 

,  +  -i-  B.  Since 

s  u+— v  n 


n 

,n 


y  e  3f (x  ) ,  we  may  find  t  >0  and  h  e  R  such  that 
n  n  J  n  nj 

i 

f(x  +h  +t  v)  -  £(x  +h  ) 

w  -  i  <  --  n-  11  -n  _ -  . n  -n-  - 


n  n 


fcn 


max {  |  h  ]  ,  t  }  <  jx  -x|/n  . 
1  n  n  n 


Next,  we  will  show  that  lim  inf  y  *v  ^  w*v.  Since  x  +  h 

n 

and  t  /lx  -x+h  |+0,  this  will  imply 
n,'  1  n  n 1  r  1 


u 


wv  <  lim  inf 
n 


f(x  +h  +t  v)  -  f (x+h  ) 
n  n  n  n  n 


n 


s  f  ( x ;  u ;  v ) 

/ 

which  is  the  desired  result. 

‘  For  each  n,  choose  y1  s  3f(x)  .  such  that  |y  -y ’ |  <  i  . 

n  ...  i..  1  n  u  n '  I 


.Then 


u+— v 
n 


n 
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y  •  (u+— v)  =  y'«(u+— v)  +  (y  -y')*(u+— v) 
n  •rn  n  vjrn  n 


s  w  •  ( u+— v )  -  — =-  u+— v 
n  z  n  1 

n 


(because  we  3f(x),  y^  e  3f(x)  ^  ) 


u+— v 
n 


£  y ' • u  +  — w*v  -  — t  u+— v 
Jn  n  2 1  n  1 

n 


(because  we  3f(x)u,  y^  e  3f(x)) 


Sy  *U  +  —  W*V  —  — jy  (  u  +  uh — -v  ) 
Jn  n  2  1  1  1  n  1 

n 


(because 


y  -y 1  ^  — y  ) 

Jn  ;n'  9 


1 

~T 

n 


So 


y  *v  2  w*v 
Jn 


-  — ( ju|+|u+iv| ) 
n  1  1  '  n  1 


and  hence  lim  inf  Yn*v  s  w*v,  as  desired.  □ 


Combining  our  results  so  far,  we  obtain  the  following: 


(3.2)  COROLLARY .  I_f  f  :  Rn  ->  R  is  locally  Lipschitz,  then  the 
following  are  equivalent 


i . 

3  f 

is 

submonotone  at 

ii . 

3  f 

is 

d.u.s.c.  at  x 

iii . 

f^(x;  • 

• 

ii 

Hi 

o 

X 

• 

■"*  » 

• 

Now  that  we  have  acquired  a  better  understanding  of  the 
submonotonicity  property  of  3f  and  what  it  implies  about  f, 
a  logical  question  to  ask  next  is:  Just  how  strong  is  this 
property?  In  other  words,  if  we  take  a  look  at  "regularity"  or 
"subdifferentiability"  properties  that  have  been  studied  for 
nondif f erentiable  functions  by  other  authors,  then  which  of  these 
imply  or  are  implied  by  the  submonotonicity  of  3f? 

A  locally  Lipschi tz  function  f  :  Rn  -*  R  is  said  to  be 
semismooth  at  x £  Rn  [Mifflin,  4]  provided  that  ^  x  and 

y,  £  3f  (x,  )  imply  that  <u,y,  >  -*■  f '  (x;u)  . 

K  K  K 

(3.3)  PROPOSITION.  If  3f  is  suomonotone  at  x  then  r  is 
semismooth  at  x. 

Proof.  If  x,.  — t.  x  and  y,.  £  3f(x,, )  then  every  subsequence 

■  K  U  K  K 

of  (y  )  has  a  subsubsequence  converging  to  some  point  in  3f  (x) 

K  VI 

★ 

by  directional  upper  semicontinuity.  Hence  <u,y,  >  -*•  'i'  _  ^  .  .  (u)  . 

K  a  jl  \  X ) 

* 

By  Proposition  3.5,  ^gf(x)(u5  =  f'(x;u).  ■ 

The  function  f (x)  =  -jxj  is  semismooth,  but  3f  is  not 

submonotone  at  x  =  0,  so  the  converse  of  3.3  is  false. 

Following  Pshenichnyi  [5],  let  us  say  that  f  is  quasi-dif fer1 

entiable  at  x  if  there  is  a  closed  convex  set  K  such  that 
* 

f'(x;*)  =  { * )  .  The  function  f  (x)  =  -|xj  is  not  quasi-dif  f  er¬ 

entiable,  so  it  is  natural  to  ask  whether  every  locally  Lipschitz 
function  which  is  both  semismooth  and  quasi-dif f erentiable  has 
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a  submonotone  subgradient  mapping.  The  answer  is  negative. 

2 

Consider  the  function  f  :  R  -*■  R  defined  as  follows: 

if  a  5  0 

if  a  >  0,  1 b |  >  a2/2 
if  a  >  0,  |b |  <  a2/2 

Then  f  is  differentiable  at  all  points  where  either  b  i-  0 
or  a  <  0.  At  all  points  x  =  (a,0)  with  a  >  0,  f  is  quasi- 
dif ferentiable  since  f'(x;*)  =  V  ( * )  with  K  =  {  (0  , -1) , ( 0 , 1)  ]  . 
f  is  also  locally  Lipschitz ,  and  it  is  not  hard  to  check,  that 
f  is  everywhere  semismooth .  However,  3f  is  not  d.u.s.c.  since 
3f(0)  =  K  but  (0,0)  e  3f(0,b)  for  all  b  ?  0. 

A  locally  Lipschitz  function  f  :  Rn  -+•  R  will  be  called 

* 

regular  at  x  [Clarke,  2]  provided  that  f'(x;*)  =  (x)  ( " )  • 

Clearly  this  is  a  stronger  property  than  quasi-differentiability. 
The  function  f  of  the  previous  paragraph  is  not  regular  at  0, 
so  it  is  natural  to  ask  whether  semismoothness  plus  regularity 
implies  the  submonotonicity  of  3f.  This  time  the  answer  is 
affirmative : 

(3.4)  PROPOSITION.  3f  is  submonotone  at  x  if,  and  only  if, 
f  is  semismooth  and  regular  at  x . 

Proof .  Suppose  f  is  semismooth  and  regular  at  x.  If  x^  — ^  x 
(u  ^  0),  y^  e  3f(x  ),  and  y  -►  y  then  ye  3f(x)  and 


10 

a2/ 4 

2  2 
I b |  -  bZ/a^ 
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<y,u>  =  lim  <yn,u> 

=  f'(x;u)  (by  seinismoothness ) 

★ 

=  4/3f(x)(u)  (by  regularity) 

so  ye  3f(x)  .  Hence  3f  is  a.u.s.c.,  hence  submonotone  at  x. 

The  other  direction  follows  by  Propositions  3.3  and  3.5.  I 

Rockafellar  [9]  has  defined  2  e  Rn  to  be  a  lower  semigrad 
ient  for  f  at  x  if 


lim  inf  *M.  >  <u,z>  Vu  £  Rn  . 

v-*-u 


If  such  a  z  exists,  f  is  lower  semidif ferentiable . 


(3.5)  PROPOSITION.  Let  f  :  Rn  -*■  R  be  locally  Lipschitz ,  3f 
submonotone  at  x.  Then 


lim 
t  +  0 
v-*-u 


f(x+tv)  -  f (x) 
t 


* 

3f  (x) 


(u) 


V’u  e  R 


n 


In  particular,  f  is  lower  semidi f ferentiable  at  x  and  3f (x) 
is  the  set  of  lower  semigradients.  Also,  f  is  regular  at  x . 


Proof .  If  u  =  0,  this  follows  easily  from  the  fact  that  f 

is  locally  Lipschitz,  so  suppose  u  i-  0.  Let  t  +0,  v^+u.  For 

each  n,  there  is  c  €  (0,1)  and  y  e  3f(x+c  t  v  )  such  that 

n  n  n  n  n 
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f{x+tnvn)  -  f(x) 
E- 


=  V  •  V 

1  n  n 


Since  x  +  c  t  v  — ►  x,  we  must  have  y  *u  -*■  H >  (u)  .  Thus 
nnnu  n  di  (x) 


f(x+tnvn)  -  f(x) 

lim  - - -  =  lim  y  *v 

n~  tn  n  n 


"  1131  Vu  '  ',3£  (x)  (u)  ' 


Hence  f  is  lower  semidif f erentiable  and  9f(x)  is  the  set  of 
lower  semigradients.  It  is  then  obvious  that  f  is  regular 
at  x.  I 


2  1 

The  converse  of  3.5  is  false:  f (x)  =  x  sin  —  is 

x 

locally  Lipschit2  and  differentiable  but  3f  is  not  submonotone 
at  x  =  0. 

and  lower  semidif ferentiabl 

It  is  also  possible  for  a  function  to  be  regular  but  for 
9f  not  to  be  submonotone.  Consider,  for  example,  any  function 
f  :  R  -*■  R  satisfying  the  following  properties: 


(i) 

f  (x)  =  x  -  -ij- 

X 

.  i 

£°r  *  =  2 

1 

'  3  ' 

(ii) 

f*  exists  and 

is  decreasing  on 

f+U)  ■ 

and  £-(k)  = 

0,  : 

( iii ) 

f  (x)  =  i  for 

x  a  j  and 

f  (0) 

(iv) 

>< 

4-1 

li 

1 

>»-■* 

for  all  x. 

l— 

\n+l  '  n  J  ' 


173 


Since  |xj  -  x2  s  f (x)  ^  jxj  for  all  x,  f'(0;u)  =  |uj  for  all 

u.  Also,  3 f ( 0 )  =  [-1,1]  so  f  is  regular  at  0.  But  3f 

is  clearly  not  submonotone  at  0.  Note  that  the  behavior  of 

f  is  nice  at  all  points  x  ^  0. 

Since  the  property  of  strict  submonotonicity  is  central  to 

this  paper,  it  is  useful  to  mention  an  example  of  a  function 
2  2 

f  :  R  -*■  R  such  that  3f  is  submonotone  everywhere,  but  is 
not  strictly  submonotone.  The  function  is 


/  j  y |  if  x  <  0 

f(x,y)  =  J  jy|-x2  if  x  >  0,  |y|  >  x2 


4  2 

— ■ — X-  if  x  >  0,  | y J  <  x2 


It  is  easily  checked  that  f  is  locally  Lipschitz,  that  3f  is 
everywhere  submonocone,  and  3f(0,0)  =  [(0,-1), (0,1)].  If  we 

let  xn  =  (h-'T-)  '  XA  *  (s-'t)'  yn  =  (I'*1)'  yn  =  (l'1)'  n“1'2' 

and  u  =  (1,0)  ,  then  x  — *•  0,  x'  — *•  0,  y  e  3f  (x  )  ,  and 
'  n  u  n  u  ;n  n 


y*  e  3f(x’)  for  all  n.  However, 
n  n 


<x  -x 1 ,  y  -y ’ > 
n  n  1  n  1  n 

x  -x1 
n  n 


=  -2  for  all  n 


so  3f  is  not  strictly  submonotone. 
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IV.  Lower-C'*'  functions 

In  this  section,  we  characterize  the  class  of  "lower-C"^ 

functions"  in  terms  of  their  Clarke  gradients.  f  :  Rn  +  R  is 

lower-C^  provided  f  can  be  represented  locally  as  f (x)  =  max  g(x,s), 

se  S 

where  S  is  compact  and  g  and  7  g  are  continuous  jointly  in  x 

X 

and  s.  In  Theorem  4.9,  it  is  demonstrated  that  a  locally  Lip- 
schitz  f  is  lower-C"''  if,  and  only  if,  3f  is  strictly  submonotone. 

The  term  "lower-C'*'  function”  was  suggested  to  us  by  Professor 
R.  T.  Rockafellar. 

(4.1)  LEMMA..  Let  f  :  Rn  -»  R  be  locally  Lipschitz,  x,ye  Rn.  For  every 

e  >  0,  there  are  neighborhoods  U  of  x  and  V  of  y  such  that  if 

x'  €  U  and  y'  e  V,  then  |'i'3f(xi)  (y)  ~  *  >  l  s  e- 

Proof .  Let  k  be  a  Lipschitz  constant  for  f  on  a  neighborhood 

U  of  x.  Then  3f  (x')  <=  kB  for  all  x'  e  U,  and  it  follows  that  k 

* 

is  a  (global)  Lipschitz  constant  for  (*)•  Take  V  to  be 

the  open  ball  of  radius  e/<  centered  at  y.  I 

(4.2)  LEMMA .  Let  f  :  Rn  ->  R  be  locally  Lipschitz .  Then 

'|,3£(x')(y>  S  °'  Vy£  R" 


(4.3) 


lim  inf 
x '  -*-x 
t+0 


f (x 1 +ty )  -  f(x’) 
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if,  and  only  if,  for  any  compact  K  c  Rn #  and  any  c  >  0 ,  there  is 
a  neighborhood  U  of  x  and  A  >  0  such  that 


(4.4) 


f (x 1 +ty ’ )  -  f(x’) 
t 


¥ 

9f (x') 


<y') 


>  -e 


whenever  x'eU,  y  '  e  K ,  0  <  t  <  A. 


Proof .  Assume  4.3  holds,  and  fix  Kc  Rn  and  c  >  0.  Since  f  is 
locally  Lipschitz,  4.3  implies 


lim  inf 
x '  -*x 

y'+y 

tio 


f (x 1 +ty ' )  -  f(x') 
t 


hf(x')(Y)  ~  °'  Vy£  R"- 


This,  and  Lemma  4.1,  imply  that  for  each  ye  K  we  may  find  neighbor¬ 
hoods  U  of  x,  V  of  y,  and  A  >0  such  that 

y  y  y 


hf(x’)(y)  -  'W')(y')  S  -eY2 


and 


f (x ' +ty ' )  -  f(x') 
t 


hfix’) (y)  2  -£/2 


whenever  x'  e  LJ  ,  y'  e  V  ,  and 

y  y 

V  ,  ***,V  for  K,  and  let  U 
y,  Y 

For  any  x'  e  U,  y'  e  K,  and  t  e 
and  we  get 


0  <  t  <  A  .  Pick  a  finite  subcover 

y 

=  U  n  •••  n  U  and  A  =  min (A  , 
yl  ym  yl 


{ 0 , A ) ,  let  i  be  such  that  y'  e  V  , 

yi 
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f  (x '  +ty  1  )  -  f(x')  _  *  {vM 

t  "  a  f  (x '  )  vy  1 


/  f  (x  1 +ty 1 )  -  f(x')  _ 


m  (  y  i  M  +  ~  y  s  e  /  v  m  ( y  ^ 


3f  (x’)  VJ,i 


3f (x’) 


<  -e/2  -  e/2  =  -e  , 

as  desired.  The  opposite  direction  of  the  lemma  is  obvious.  B 

(4.5)  PROPOSITION.  If  f  :  Rn  +  R  is  locally  Li^schitz,  then 
3 f  is  strictly  submonotone  at  x  if,  and  only  if,  4 . 3  holds . 


Proof.  (  =>  )  If  y  =  0,  the  assertion  is  trivial.  Without  any 
loss  of  generality,  we  may  assume  that  jy|  =  1.  Fix  e  >  0. 
Since  3f  is  strictly  submonotone  at  x,  there  is  r  >  0  such  that 


<xl~x2'yl~y2> 

I  xl-x2~l 


>  -e 


whenever  lx.  -  x|  <  2r,  y.  e  3f(x.)  for  i  =  1,2, 

1  l  1  1 1  i 

Let  x'  and  t  be  chosen  so  that  |x'  -  x|  <  r  and 
will  complete  the  proof  by  showing  that 


and  x^  /  x^ - 
0  <  t  <  r.  We 


f (x 1 +ty )  -  f(x’)  _  (y)  > 

t  3f (x* )  -  ‘ 

Choose  any  y^ e  9f(x')  •  By  the  mean-value  theorem  of  Lebourg  [  3  ], 

we  may  find  sc  (0,t)  and  y^ e  3f(x'+sy)  such  that  f(x'tty)  -  f(x')  = 
t<y,y2>-  Letting  x^  =  x'  and  X£  =  x'  +  sy,  we  have 


17/ 


ftx'+ty)  -  £lxl_L  _  o*  (y)  =  <  y  y  -y  ; 

t  ‘ 3f (x1 )  KY>  y,y2  Y1 


<X2'Xl'y2_Yl> 

PvrxTl 


(<  =  )  Next,  suppose  4.3  holds,  and  let  z  >  0  be  given.  By  Lemma 
4.2,  there  is  a  neighborhood  U  of  x  and  X  >  0  such  that 


f (x 1 +tu)  ~  f  (x  1  ) 
t 


^  3  f (x ' )  (U)  “  £/2 


whenever  x'  e  U,  [u|  <  1,  and  0  <  t  <  X.  We  may  also  assume  that 
U  is  small  enough  so  that  \z  -  z'|  <  X  for  all  z,  z'  e  U.  Fix 
x i €  U,  yi e  3f (xi)  for  i=l , 2 ,  with  x1  /  x2 .  Let  t  =  | x2  -  x1 i 
and  u  =  (x2  -  x^)/t.  Then 


<xi~x2-yry2: 

lxrx2i 


=  -<u,y. >  -  <-u,y2> 


2  "Y3f (x1) (U)  ?3f (x2) (  U) 


f (x. +tu)  -  f  (X,  )  * 


¥af (xx) (u) 


f(x_-tu)  -  f(x7)  * 

+  z _ t _  -  w  (-u) 

+  t  ‘ 3f (x2)  1  ; 


z  z 

-  ~2  2  Z  ' 


which  shows  that  3f  is  strictly  submonotone  at  x.  I 
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(4.6)  LEMMA.  Let  f  :  Rn  -*■  R  be  locally  Lipschitz,  let  C  and  K 
be  compact  sets  in  Rn ,  and  suppose  that  3f  is  strictly  submonotone 
on  C.  Then 


lim  inf 
xeC 
yeK 
t+0 


f (x+ty) 
t 


f  (x) 


^3f (x) (y) 


>  0  . 


Proof .  Let  e  >  0  be  given.  By  Proposition  4.5  and  Lemma  4.2, 

for  each  x  e  C,  there  is  A  >0  such  that 

x 


f (x ' +ty)  -  f (x ' ) 
t 


¥3f (x' ) (y)  "  £ 


whenever  jx'  -  x|  <  A  ,  yeK,  and  0  <  t  <  A  .  Let  x.  ,  •  •  •  ,x  e  C 

x  x  i  r 

be  such  that  for  every  xe  C  we  have  |x  -  x.  J  <  A  for  some  i. 

l 

Let  A  =  min (A  , • • • , A  )  .  Then  for  any  x  e  C ,  yeK,  we  have 

Xi  X 

1  r 


f<*+tyj  -  t(«).  .  ,y)  2  -e 


whenever  0  <  t  <  A .  I 

(4.7)  LEMMA.  Let  f)(t)  be  real-valued,  defined  for  t  >  0  suffi¬ 
ciently  small,  such  that  lim  $(t)  =  0.  Then  there  is  a  continu- 

t-*-0  . 

ously  differentiable  function  a(t)  defined  on  [0,a)  for  some 


a  >  0  such  that 


179 


a  (0)  =  a*  (0)  =  0 

a(t)  >  t4>  ( t )  f  yt  c  (0,a]  . 

Proof .  Let  a  >  0  be  such  that  4>  is  bounded  above  on  (0,2a]  , 

k. 

and  let  =  a/2  ,  k=0,l,***  .  If  B  is  the  infimurn  of  all  affine 

functions  l  :  R  -*•  R  which  satisfy  2,  (a,  )  >  <+>  ( t )  for  all  te  (0,2a,  ] 

K  K. 

and  all  k=0,l,2,*»*  then  the  following  properties  are  easily 
checked : 

6  is  continuous,  concave,  nondecreasing  on  [0,a] 

6(0)  =  0 

6  >  on  (0,a] 

6  is  affine  on  [a^^/a^],  k=0,l,2,***  . 

Also,  the  right  derivative  of  B  has  these  properties: 

B|  is  finite,  nonnegative,  and  nonincreasing  on  (0,a) 

B^  is  constant  on  [a^  ^  ,  a^)  ,  k=0,l,2,*** 

B|  is  integrable  on  [0,a], 

This  last  assertion  is  proven  as  follows.  Whenever  0  <  u  <  v  <  a, 

v 

B(v)  -  B  (u)  =  /  b;(s)  ds 
u 

(cf.  Rockafellar  [6,  24.2.1]).  Since  B_|_  ^  0  and  B  is  continuous, 


18U 


/  B'(s)  ds  =  lim  /  B'(s)  ds  =  B(a)  -  B(0)  < 


u+0  u 


so  B  is  integrable.  Note  that  since  3(0)  =  0,  B(t)  =  /  B'(s)  ds 

0 

for  all  t  e  [ 0 , a ] . 

For  each  k=l,2,***,  pick  c^  such  that 


2Uk  +  ak+l>  ‘  Ck  '  ak 


(ak  -  Ck>  -  Bhak>>  '  ak+l 


Define  y  :  (0,a)  +  R  to  be  the  function  that  agrees  with  1  +  S_|_ 

on  the  intervals  [ak+^,c^]  (k=l,2,**»)  and  on  [a^,aQ),  and  is 

affine  on  the  intervals  (k=l,2,***).  Then  y  is  continuous, 

nonnegative,  and  nonincreasing  on  (0,a)  and 


/  y(s)  -  B|(s)  ds  >  0 


for  all  k=0,l,2,***, 
tE  Iak+i'akJ 


Since  0  5  y  <  3_J_  +  1  and  B^_  is  integrable,  it  fol1'  that  y  i: 
integrable.  Then  for  all  te  [0,a], 


t  t 

/  y ( s )  ds  a  /  B’(s)  ds  =  B(t)  . 


Define  a(t)  =  t /  y(s)  ds  for  all  te  [0,a].  Clearly, 

0 
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a  is  continuously  differentiable  on  (0,a]. 
a  (0)  =  0 

a(t)  >  t$(t)  for  te  (0,a]  . 


It  remains  only  to  show  that  a  is  continuously  differentiable  at 


0.  We  have  a' (0)  =  lim  a  =  lim  /  u(s)  ds  =  0.  Also,  for  t  >  0 

t-»-0+  t  t+0  0 

t 

a ' (t)  =  /  v  (s)  ds  +  tp ( t ) 

0 

t 

=  /  (ii  (s)  +  vi  (t)  )  ds 
0 

t 

<  2 /  y(s)  ds  (since  y  is  nondecreasing) 

0 


f(x  +  ty)  >  f ( x )  +  t¥3f(xj(y)  -  a (t ) 


whenever  x  e  C ,  j  y |  = 


1 ,  and  0  <  t  <  a . 
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Proof.  For  t  >  0,  define 


4>  (t) 


-  inf  min 
t'<t 


( 


f (x+t'y)  -  f(x) 


X£  C 

|y|=i 


V3f  U)  (y)  ' 


Then  4>  >  0  and  by  Lemma  4.6,  lim  <p  (t)  =0.  By  Lemma  4.7,  there 

t-*-0 

is  a  real-valued  function  a(t)  which  is  continuously  differen¬ 
tiable  on  [0,a]  for  some  a  >  0  such  that  a (0)  =  a' (0)  =  0  and 
a(t)  >  t<Mt)  for  all  te  (0,a].  It  follows  that  f  (x  +  ty)  s  f  (x)  + 

tL.  ,  .  (y)  -  a(t)  whenever  XtC,  |y|  =  1  and  0  <  t  £  a-.  0 

dt  (x; 

(4.9)  THEOREM.  Let  f  :  Rn  R  be  locally  Lipschitz.  5f  is 
strictly  submonotone  if,  and  only  if,  for  every  x  e  Rn  there  is 
a  neighborhood  U  of  x,  a  compact  set  S  and  a  continuous  function 
q  :  U  x  S  -*■  R  such  that  V__q(x,s)  exists  and  is  continuous  in 

X 

(x,s)  and  such  that 


f (x)  =  max  g (x , s )  V x  e  U  . 

se  S 

Proof.  (=>)  Suppose  3f  is  strictly  submonotone,  and  fix  xe  Rn . 

By  Proposition  4.8,  there  is  a  >  0,  and  a  function  a  :  [0,a]  ■*  R 

such  that  a(0)  =  a'(0)  =  0  and 


f (x+y)  >  f ( x )  +  <;,y>  -  a ( | y | ) 
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whenever  |  x-x  |  5  1,  |yj  <  a,  and  f;  t  3f(x).  Let  b  =  min{l,a/2}. 
Then 

f(x)  >  f(x')  +  <x-x',C>  -  ot  (  I  x-x '  |  ) 

whenever  |x-x|  <  b,  |x'-xj  <  b,  and  Ce  3f(x').  Let  U  = 

{x  :  |x-x|  <  b}  and  S  =  {(x’,c)  ••  |x’-x[  5  b,  3f(x')}.  If 

we  define 

g(x,x',c)  =  f(x')  +  <x-x’,C>  -  a ( [ x-x ' | ) , 

then  g  has  the  desired  properties. 

(  =  >)  Fix  x  £  Rn,  let  U,  S,  and  g  be  as  indicated,  and  let 
Kcu  be  a  compact  convex  neighborhood  of  x.  By  compactness, 
Vxg(x,s)  is  uniformly  continuous  on  K  x  s.  So,  defining  for  t  > 

n(t)  =  sup  ]V  g(z,s)  -  V  g(z',s)| 
z ,  z '  e  K  X  X 

se  S 

lz-z' |st 

we  have  lira  n(t)  =  0.  By  Lemma  4.7  there  is,  for  some  a  >  0, 

trO 

a  function  a  :  [0,a]  R  such  that  a(0)  =  a'(0)  =  0  and 

a(t)  2  tn  (t)  for  all  te  (0,aj. 

Fix  x,  x'  e  K  such  that  x  /  x' .  For  each  s  e  S,  by  the  mean- 
value  theorem,  there  is  x"  e  K  on  the  line  segment  (x,x')  such 
that  g(x',s)  -  g(x,s)  =  (x’  -  x) • Vxg (x" , s ) .  Then 
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[g (x ' , s)  -  g (x, s)  -  (x* -x) • Vxg (x,s) ]/ | x'  -  : 
=  (Vxg(x",s)  -  Vxg(x,s))  ^ 

s  -n(|x"  -  xj)  >  -n(|x'  -  x | }  ^  -  — 

Hence,  for  all  Se  S, 

g  (x '  ,  s )  >  g  (x,  s)  +  (x1  -  x)*^xg(x,s)  -  o(|x 

Let  3f(x)  be  arbitrary.  By  Clarke  [1,  Theorem 
find  si'**’'sjc€S  and  numbers  ^i'***'^  such  that 

C  -  IX.Vxg(x,5i> 

A.  >  0,  JX.  =  1,  g(x,s.)  =  f(x) 


Then 


f(x’)  >  ^Aig(x,,si) 

>  £A^(g(x,s^)  +  (x'-x)* 


(lx1  -  xl) 

I  X '  -  X|  * 

’  -  X I  )  . 

2.1],  we  may 


Vxg(X/Si)  “ 
a  ( | x" -x | )  ) 


=  f (x)  +  (x’-x)*c  -  a(|x’-x|)  . 


Since  this  holds  for  all  £  e  f (x) ,  we  have  shown  that  for  all 
x,  x'  £  K  with  x  ^  x'  ,  we  have 

f(x')  >  f ( x )  +  ^3f (x)  ^x'  ~  x)  "  a(|x’  "  xl)  • 

It  then  follows  easily  by  Lemma  4.5  that  Sf  is  strictly  submono¬ 
tone  at  every  i.nterior  point  of  K,  and  hence  in  particular  at  x.  1 
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